Search Results: "moth"

12 February 2023

Russell Coker: T320 iDRAC Failure and new HP Z640

The Dell T320 Almost 2 years ago I made a Dell PowerEdge T320 my home server [1]. It was a decent upgrade from the PowerEdge T110 II that I had used previously. One benefit of that system was that I needed more RAM and the PowerEdge T1xx series use unbuffered ECC RAM which is unreasonably expensive as well as the DIMMs tending to be smaller (no Load Reduced DIMMS) and only having 4 slots. As I had bought two T320s I put all the RAM in a single server getting a total of 96G and then put some cheap DIMMs in the other one and sold it with 48G. The T320 has all the server reliability features including hot-swap redundant PSUs and hot-swap hard drives. One thing it doesn t have redundancy on is the motherboard management system known as iDRAC. 3 days ago my suburb had a power outage and when power came back on the T320 gave an error message about a failure to initialise the iDRAC and put all the fans on maximum speed, which is extremely loud. When a T320 is running in a room that s not particularly hot and it doesn t have SAS disks it s a very quiet server, one of the quietest I ve ever owned. When it goes into emergency cooling mode due to iDRAC failure it s loud enough to be heard from the other end of the house with doors closed in between. Googling this failure gave a few possible answers. One was for some combinations of booting with the iDRAC button held down, turning off for a while and booting with the iDRAC button held down, etc (this didn t work). One was for putting a iDRAC firmware file on the SD card so iDRAC could automatically load it (which I tested even though I didn t have the flashing LED which indicates that it is likely to work, but it didn t do anything). The last was to enable serial console and configure the iDRAC to load new firmware via TFTP, I didn t get a iDRAC message from the serial console just the regular BIOS stuff. So it looks like I ll have to sell the T320 for parts or find someone who wants to run it in it s current form. Currently to boot it I have to press F1 a few times to bypass BIOS messages (someone on the Internet reported making a device to key-jam F1). Then when it boots it s unreasonably loud, but apparently if you are really keen you can buy fans that have temperature sensors to control their own speed and bypass the motherboard control. I d appreciate any advice on how to get this going. At this stage I m not going to go back to it but if I can get it working properly I can sell it for a decent price. The HP Z640 I ve replaced the T320 with a HP Z640 workstation with 32G of RAM which I had recently bought to play with Stable Diffusion. There were hundreds of Z640 workstations with NVidia Quadro M6000 GPUs going on eBay for under $400 each, it looked like a company that did a lot of ML work had either gone bankrupt or upgraded all their employees systems. The price for the systems was surprisingly cheap, at regular eBay prices it seems that the GPU and the RAM go for about the same price as the system. It turned out that Stable Diffusion didn t like the video card in my setup for unknown reasons but also that the E5-1650v3 CPU could render an image in 15 minutes which is fast enough to test it out but not fast enough for serious use. I had been planning to blog about that. When I bought the T320 server the DDR3 Registered ECC RAM it uses cost about $100 for 8*8G DIMMs, with 16G DIMMs being much more expensive. Now the DDR4 Registered ECC RAM used by my Z640 goes for about $120 for 2*16G DIMMs. In the near future I ll upgrade that system to 64G of RAM. It s disappointing that the Z640 only has 4 DIMM sockets per CPU so if you get a single-CPU version (as I did) and don t get the really expensive Load Reduced RAM then you are limited to 64G. So the supposed capacity benefit of going from DDR3 to DDR4 doesn t seem to apply to this upgrade. The Z640 I got has 4 bays for hot-swap SAS/SATA 2.5 SSD/HDDs and 2 internal bays for 3.5 hard drives. The T320 has 8*3.5 hot swap bays and I had 3 hard drives in them in a BTRFS RAID-10 configuration. Currently I ve got one hard drive attached via USB but that s obviously not a long-term solution. The 3 hard drives are 4TB, they have worked since 4TB was a good size. I have a spare 8TB disk so I could buy a second ($179 for a shingle HDD) to make a 8TB RAID-1 array. The other option is to pay $369 for a 4TB SSD (or $389 for a 4TB NVMe + $10 for the PCIe card) to keep the 3 device RAID-10. As tempting as 4TB SSDs are I ll probably get a cheap 8TB disk which will take capacity from 6TB to 8TB and I could use some extra 4TB disks for backups. I haven t played with the AMT/MEBX features on this system, I presume that they will work the same way as AMT/MEBX on the HP Z420 I ve used previously [2]. Update: HP has free updates for the BIOS etc available here [3]. Unfortunately it seems to require loading a kernel module supplied by HP to do this. This is a bad thing, kernel code that isn t in the mainline kernel is either of poor quality or isn t licensed correctly. I had to change my monitoring system to alert on temperatures over 100% of the high range while on the T320 I had it set at 95% of high and never got warnings. This is disappointing, enterprise class gear running in a reasonably cool environment (ambient temperature of about 22C) should be able to run all CPU cores at full performance without hitting 95% of the high temperature level.

28 January 2023

Russ Allbery: Review: The Library of the Dead

Review: The Library of the Dead, by T.L. Huchu
Series: Edinburgh Nights #1
Publisher: Tor
Copyright: 2021
Printing: 2022
ISBN: 1-250-76777-6
Format: Kindle
Pages: 329
The Library of the Dead is the first book in a post-apocalyptic (sort of) urban fantasy series set in Edinburgh, written by Zimbabwean author (and current Scotland resident) T.L. Huchu. Ropa is a ghosttalker. This means she can see people who have died but are still lingering because they have unfinished business. She can stabilize them and understand what they're saying with the help of her mbira. At the age of fourteen, she's the sole source of income for her small family. She lives with her grandmother and younger sister in a caravan (people in the US call it an RV), paying rent to an enterprising farmer turned landlord. Ropa's Edinburgh is much worse off than ours. Everything is poorer, more run-down, and more tenuous, but other than a few hints about global warming, we never learn the history. It reminded me a bit of the world in Octavia Butler's Parable of the Sower in the feel of civilization crumbling without a specific cause. Unlike that series, The Library of the Dead is not about the collapse or responses to it. The partial ruin of the city is the mostly unremarked backdrop of Ropa's life. Much of the book follows Ropa's daily life carrying messages for ghosts and taking care of her family. She does discover the titular library when a wealthier friend who got a job there shows it off to her, but it has no significant role in the plot. (That was disappointing.) The core plot, once Ropa is convinced by her grandmother to focus on it, is the missing son of a dead woman, who turns out to not be the only missing child. This is urban fantasy with the standard first-person perspective, so Ropa is the narrator. This style of book needs a memorable protagonist, and Ropa is certainly that. She's a talker who takes obvious delight in narrating her own story alongside a constant patter of opinions, observations, and Scottish dialect. Ropa is also poor. That last may not sound that notable; a lot of urban fantasy protagonists are not well-off. But most of them feel culturally middle-class in a way that Ropa does not. Money may be a story constraint in other books, but it rarely feels like a life constraint and experience the way it does here. It's hard to describe the difference in tone succinctly, since it's a lot of small things: the constant presence of money concerns, the frustration of possessions that are stolen or missing and can't be replaced, the tedious chores one has to do when there's no money, even the language and vulgarity Ropa uses. This is rare in fantasy and excellent characterization work. Given that, I am still frustrated with myself over how much I struggled with Ropa as a narrator. She's happy to talk about what is happening to her and what she's learning about (she listens voraciously to non-fiction while running messages), but she deflects, minimizes, or rushes past any mention of what she's feeling. If you don't like the angst that's common from urban fantasy protagonists, this may be the book for you. I have complained about that angst before, and therefore feel like this should have been the book for me, but apparently I need a minimum level of emotional processing and introspection from the narrator. Ropa is utterly unwilling to do any of that. It's possible to piece together what she's feeling and worrying about, but the reader has to rely on hints and oblique comments that she passes over quickly. It didn't help that Ropa is not interested in the same things in her world that I was interested in. She's not an unreliable narrator in the conventional sense; she doesn't lie to the reader or intentionally hide information. And yet, the experience of reading this book was, for me, similar to reading a book with an unreliable narrator. Ropa consistently refused to look at what I wanted her to look at or think about what I wanted her to think about. For example, when she has an opportunity to learn magic through books from the titular library, her initial enthusiasm is infectious. Huchu does a great job showing the excitement of someone who likes new ideas and likes telling other people about the neat things she just learned. But when things don't work the way she expected from the books, she doesn't follow up, experiment, or try to understand why. When her grandmother tries to explain something to her from a different angle, she blows her off and refuses to pay attention. And when she does get magic to work, she never tries to connect that to her previous understanding. I kept waiting for Ropa to try to build her own mental model of magic, but she would only toy with an idea for a few pages and then put it down and never mention it again. This is not a fault in the book, just a mismatch between the book and what I wanted to read. All of this is consistent with Ropa's defensive strategies, emotional resiliency, and approach to understanding the world. (I strongly suspect Huchu was giving Ropa some ADHD characteristics, and if so, I think he got it spot on.) Given that, I tried to pivot to appreciating the characterization and the world, but that ran into another mismatch I had with this book, and the reason why I passed on it when it initially came out. I tend to avoid fantasy novels about ghosts. This is not because I mind ghosts themselves, but I've learned from experience that authors who write about ghosts usually also write about other things that I don't want to read about. That unfortunately was the case here; The Library of the Dead was too far into horror for me. There's child abuse, drugs, body horror, and similar nastiness here, more than I wanted in my head. Ropa's full-speed-ahead attitude and refusal to dwell on anything made it a bit easier to read, but it was still too much for me. Ropa is a great character who is refreshingly different than the typical urban fantasy protagonist, and the few hints of the magical library and world background we get were intriguing. This book was not for me, but I can see why other people will love it. Followed by Our Lady of Mysterious Ailments. Rating: 6 out of 10

9 January 2023

Gunnar Wolf: Back to Xochicalco

In Mexico, we have the great luck to live among vestiges of long-gone cultures, some that were conquered and in some way got adapted and survived into our modern, mostly-West-Europan-derived society, and some that thrived but disappeared many more centuries ago. And although not everybody feels the same way, in my family we have always enjoyed visiting archaeological sites when I was a child and today. Some of the regulars that follow this blog (or its syndicators) will remember Xochicalco, as it was the destination we chose for the daytrip back in the day, in DebConf6 (May 2006). This weekend, my mother suggested us to go there, as being Winter, the weather is quite pleasant we were at about 25 C, and by the hottest months of the year it can easily reach 10 more; the place lacks shadows, like most archaeological sites, and it does get quite tiring nevertheless! Xochicalco is quite unique among our archaeological sites, as it was built as a conference city: people came from cultures spanning all of Mesoamerica to debate and homogeneize the calendars used in the region. The first photo I shared here is by the Quetzalc atl temple, where each of the four sides shows people from different cultures (the styles in which they are depicted follow their local self-representations), encodes equivalent dates in the different calendaric systems, and are located along representationsof the God of knowledge, the feathered serpent, Quetzalc atl. It was a very nice day out. And, of course, it brought back memories of my favorite conference visiting the site of a very important conference

27 December 2022

Steve Kemp: A summary of the year.

This year had a lot of things happen in it, world-wide, as is always the case. Being more selfish here are the things I remember, in brief unless there are comments/questions: On the topic of Finnish I'm getting pretty damn good at understanding, albeit less good in speaking. Finnish is all about the suffixes, so: Most of this is regular, so you can be childless via the "ton" suffix - lapsiton is "lapsi" (child) "ton" (less). The hard part in communication is thus twofold: Our child turned six recently, and most of the year was spent doing things with him, for him, and to him. He's on the verge of learning to read (English and Finnish), he's interested in maths and completes little puzzles freely and happily. He likes to help with Sodoku, for example and not just the child-versions. In the past couple of weeks I let him play Super Mario & Super Mario Bros 3, on a NES Classic, and he dies constantly, with a smile on his face. But he does love to tell me what to do when he watches me play! He's learned to ice-skate, and ski, and almost learned to swim. (I'll say he can swim 3m inside a pool, without aid, but then he starts to sink.) We've got a couple of regular rituals with each other - including going to sauna every week or two, and other similar things. He's gotten more interested in helping me cook, and his mother too. (My wife and I live in separate houses..) I guess the next big milestone will be him walking to school by himself, which will start next year. As things stand I wake up early, go over to his house, and do all the morning-things, before I take him there. I expect I'll still want to go there, give him his breakfast, his medicine, and help him get dressed. After that I guess I kick him out, and he makes his own way there. Happily the walk to school is a few hundred meters, and doesn't involve crossing any roads. But of course it does bring other complications: if he's not collected, and walks home himself, then he needs a key to one/other house, and there's the potential need for a phone to say "I'm late", "I'm lost", or us to say "Where are you?". Anyway .. interesting year .. good year. Despite everything else.

23 December 2022

Louis-Philippe V ronneau: 2022 A Musical Retrospective

With the end of the year approaching fast, I thought putting my year in retrospective via music would be a fun thing to do. Albums In 2022, I added 51 new albums to my collection nearly one a week! I listed them below in the order in which I acquired them. I purchased most of these albums when I could and borrowed the rest at libraries. If you want to browse though, I added links to the album covers pointing either to websites where you can buy them or to Discogs when digital copies weren't available1. Browsing through the albums, I can see my tastes really shifted a lot in the last few years. I used to listen to a lot of Hip-Hop, but the recent trends in this genre2 really turn me off. In fact, it seems I didn't add a single Hip-Hop album to my collection this year... Metal also continues to dominate the list. Many thanks to Angry Metal Guy for being the best metal reviewing website out there. Concerts 2022 was also a big change for me, as I started going to much more concerts than I previously did. metalfinder has been working great and I'm really happy with it. Here are the concerts I went to in 2022: I'm looking forward continuing to go to a lot of concerts in 2023!

  1. Some of the albums especially the O ! ones are pretty underground. For most of those, I actually have physical copies I bought and ripped.
  2. Mostly mumble rap, beats than are less and less sample-based, extreme commercialisation and lyrics that are less and less political and engaged.

20 December 2022

Russ Allbery: Review: Tess of the Road

Review: Tess of the Road, by Rachel Hartman
Series: Tess of the Road #1
Publisher: Random House
Copyright: 2018
Printing: 2022
ISBN: 1-101-93130-2
Format: Kindle
Pages: 536
Tess of the Road is the first book of a YA fantasy duology set in the same universe as Seraphina and Shadow Scale. It's hard to decide what to say about reading order (and I now appreciate the ambiguous answers I got). Tess of the Road is a sequel to Seraphina and Shadow Scale in the sense that there are numerous references to the previous duology, but it has a different protagonist and different concerns. You don't need to read the other duology first, but Tess of the Road will significantly spoil the resolution of the romance plot in Seraphina, and it will be obvious that you've skipped over background material. That said, Shadow Scale is not a very good book, and this is a much better book. I guess the summary is this: if you're going to read the first duology, read it first, but don't feel obligated to do so. Tess was always a curious, adventurous, and some would say unmanageable girl, nothing like her twin. Jeanne is modest, obedient, virtuous, and practically perfect in every way. Tess is not; after a teenage love affair resulting in an out-of-wedlock child and a boy who disappeared rather than marry her, their mother sees no alternative but to lie about which of the twins is older. If Jeanne can get a good match among the nobility, the family finances may be salvaged. Tess's only remaining use is to help her sister find a match, and then she can be shuffled off to a convent. Tess throws herself into court politics and does exactly what she's supposed to. She engineers not only a match, but someone Jeanne sincerely likes. Tess has never lacked competence. But this changes nothing about her mother's view of her, and Tess is depressed, worn, and desperately miserable in Jeanne's moment of triumph. Jeanne wants Tess to stay and become the governess of her eventual children, retaining their twin bond of the two of them against the world. Their older sister Seraphina, more perceptively, tries to help her join an explorer's expedition. Tess, in a drunken spiral of misery, insults everyone and runs away, with only a new pair of boots and a pack of food. This is going to be one of those reviews where the things I didn't like are exactly the things other readers liked. I see why people loved this book, and I wish I had loved it too. Instead, I liked parts of it a great deal and found other parts frustrating or a bit too off-putting. Mostly this is a preference problem rather than a book problem. My most objective complaint is the pacing, which was also my primary complaint about Shadow Scale. It was not hard to see where Hartman was going with the story, I like that story, I was on board with going there, but getting there took for-EV-er. This is a 536 page book that I would have edited to less than 300 pages. It takes nearly a hundred pages to get Tess on the road, and while some of that setup is necessary, I did not want to wallow in Tess's misery and appalling home life for quite that long. A closely related problem is that Hartman continues to love flashbacks. Even after Tess has made her escape, we get the entire history of her awful teenage years slowly dribbled out over most of the book. Sometimes this is revelatory; mostly it's depressing. I had guessed the outlines of what had happened early in the book (it's not hard), and that was more than enough motivation for me, but Hartman was determined to make the reader watch every crisis and awful moment in detail. This is exactly what some readers want, and sometimes it's even what I want, but not here. I found the middle of the book, where the story is mostly flashbacks and flailing, to be an emotional slog. Part of the problem is that Tess has an abusive mother and goes through the standard abuse victim process of being sure that she's the one who's wrong and that her mother is justified in her criticism. This is certainly realistic, and it eventually lead to some satisfying catharsis as Tess lets go of her negative self-image. But Tess's mother is a narcissistic religious fanatic with a persecution complex and not a single redeeming quality whatsoever, and I loathed reading about her, let alone reading Tess tiptoeing around making excuses for her. The point of this in the story is for Tess to rebuild her self-image, and I get it, and I'm sure this will work for some readers, but I wanted Tess's mother (and the rest of her family except her sisters) to be eaten by dragons. I do not like the emotional experience of hating a character in a book this much. Where Tess of the Road is on firmer ground is when Tess has an opportunity to show her best qualities, such as befriending a quigutl in childhood and, in the sort of impulsive decision that shows her at her best, learning their language. (For those who haven't read the previous books, quigutls are a dog-sized subspecies of dragon that everyone usually treats like intelligent animals, although they're clearly more than that.) Her childhood quigutl friend Pathka becomes her companion on the road, which both gives her wanderings some direction and adds some useful character interaction. Pathka comes with a plot that is another one of those elements that I think will work for some readers but didn't work for me. He's in search of a Great Serpent, a part of quigutl mythology that neither humans or dragons pay attention to. That becomes the major plot of the novel apart from Tess's emotional growth. Pathka also has a fraught relationship with his own family, which I think was supposed to parallel Tess's relationships but never clicked for me. I liked Tess's side of this relationship, but Pathka is weirdly incomprehensible and apparently fickle in ways that I found unsatisfying. I think Hartman was going for an alien tone that didn't quite work for me. This is a book that gets considerably better as it goes along, and the last third of the book was great. I didn't like being dragged through the setup, but I loved the character Tess became. Once she reaches the road crew, this was a book full of things that I love reading about. The contrast between her at the start of the book and the end is satisfying and rewarding. Tess's relationship with her twin Jeanne deserves special mention; their interaction late in the book is note-perfect and much better than I had expected. Unfortunately, Tess of the Road doesn't have a real resolution. It's only the first half of Tess's story, which comes back to that pacing problem. Ah well. I enjoyed this but I didn't love it. The destination was mostly worth the journey, but I thought the journey was much too long and I had to spend too much time in the company of people I hated far more intensely than was comfortable. I also thought the middle of the book sagged, a problem I have now had with two out of three of Hartman's books. But I can see why other readers with slightly different preferences loved it. I'm probably invested enough to read the sequel, although I'm a bit grumbly that the sequel is necessary. Followed by In the Serpent's Wake. Rating: 7 out of 10

12 December 2022

Russ Allbery: Review: The Unbroken

Review: The Unbroken, by C.L. Clark
Series: Magic of the Lost #1
Publisher: Orbit
Copyright: March 2021
ISBN: 0-316-54267-9
Format: Kindle
Pages: 490
The Unbroken is the first book of a projected fantasy trilogy. It is C.L. Clark's first novel. Lieutenant Touraine is one of the Sands, the derogatory name for the Balladairan Colonial Brigade. She, like the others of her squad, are conscript soldiers, kidnapped by the Balladairan Empire from their colonies as children and beaten into "civilized" behavior by Balladairan training. They fought in the Balladairan war against the Taargens. Now, they've been reassigned to El-Wast, capital city of Qaz l, the foremost of the southern colonies. The place where Touraine was born, from which she was taken at the age of five. Balladaire is not France and Qaz l is not Algeria, but the parallels are obvious and strongly implied by the map and the climates. Touraine and her squad are part of the forces accompanying Princess Luca, the crown princess of the Balladairan Empire, who has been sent to take charge of Qaz l and quell a rebellion. Luca's parents died in the Withering, the latest round of a recurrent plague that haunts Balladaire. She is the rightful heir, but her uncle rules as regent and is reluctant to give her the throne. Qaz l is where she is to prove herself. If she can bring the colony in line, she can prove that she's ready to rule: her birthright and her destiny. The Qaz li are uninterested in being part of Luca's grand plan of personal accomplishment. She steps off her ship into an assassination attempt, foiled by Touraine's sharp eyes and quick reactions, which brings the Sand to the princess's attention. Touraine's reward is to be assigned the execution of the captured rebels, one of whom recognizes her and names her mother before he dies. This sets up the core of the plot: Qaz li rebellion against an oppressive colonial empire, Luca's attempt to use the colony as a political stepping stone, and Touraine caught in between. One of the reasons why I am happy to see increased diversity in SFF authors is that the way we tell stories is shaped by our cultural upbringing. I was taught to tell stories about colonialism and rebellion in a specific ideological shape. It's hard to describe briefly, but the core idea is that being under the rule of someone else is unnatural as well as being an injustice. It's a deviation from the way the world should work, something unexpected that is inherently unstable. Once people unite to overthrow their oppressors, eventual success is inevitable; it's not only right or moral, it's the natural path of history. This is what you get when you try to peel the supremacy part away from white supremacy but leave the unshakable self-confidence and bedrock assumption that the universe cares what we think. We were also taught that rebellion is primarily ideological. One may be motivated by personal injustice, but the correct use of that injustice is to subsume it into concepts such as freedom and democracy. Those concepts are more "real" in some foundational sense, more central to the right functioning of the world, than individual circumstance. When the now-dominant group tells stories of long-ago revolution, there is no personal experience of oppression and survival in which to ground the story; instead, it's linked to anticipatory fear in the reader, to the idea that one's privileges could be taken away by a foreign oppressor and that the counter to this threat is ideological unity. Obviously, not every white fantasy author uses this story shape, but the tendency runs deep because we're taught it young. You can see it everywhere in fantasy, from Lord of the Rings to Tigana. The Unbroken uses a much different story shape, and I don't think it's a coincidence that the author is Black. Touraine is not sympathetic to the Qaz li. These are not her people and this is not her life. She went through hell in Balladairan schools, but she won a place, however tenuous. Her personal role model is General Cantic, the Balladairan Blood General who was also one of her instructors. Cantic is hard as nails, unforgiving, unbending, and probably a war criminal, but also the embodiment of a military ethic. She is tough but fair with the conscript soldiers. She doesn't put a stop to their harassment by the regular Balladairan troops, but neither does she let it go too far. Cantic has power, she knows how to keep it, and there is a place for Touraine in Cantic's world. And, critically, that place is not just hers: it's one she shares with her squad. Touraine's primary loyalty is not to Balladaire or to Qaz l. It's to the Sands. Her soldiers are neither one thing nor the other, and they disagree vehemently among themselves about what Qaz l and their other colonial homes should be to them, but they learned together, fought together, and died together. That theme is woven throughout The Unbroken: personal bonds, third and fourth loyalties, and practical ethics of survival that complicate and contradict simple dichotomies of oppressor and oppressed. Touraine is repeatedly offered ideological motives that the protagonist in the typical story shape would adopt. And she repeatedly rejects them for personal bonds: trying to keep her people safe, in a world that is not looking out for them. The consequence is that this book tears Touraine apart. She tries to walk a precarious path between Luca, the Qaz li, Cantic, and the Sands, and she falls off that path a lot. Each time I thought I knew where this book was going, there's another reversal, often brutal. I tend to be a happily-ever-after reader who wants the protagonist to get everything they need, so this isn't my normal fare. The amount of hell that Touraine goes through made for difficult reading, worse because much of it is due to her own mistakes or betrayals. But Clark makes those decisions believable given the impossible position Touraine is in and the lack of role models she has for making other choices. She's set up to fail, and the price of small victories is to have no one understand the decisions that she makes, or to believe her motives. Luca is the other viewpoint character of the book (and yes, this is also a love affair, which complicates both of their loyalties). She is the heroine of a more typical genre fantasy novel: the outsider princess with a physical disability and a razor-sharp mind, ambitious but fair (at least in her own mind), with a trusted bodyguard advisor who also knew her father and a sincere desire to be kinder and more even-handed in her governance of the colony. All of this is real; Luca is a protagonist, and the reader is not being set up to dislike her. But compared to Touraine's grappling with identity, loyalty, and ethics, Luca is never in any real danger, and her concerns start to feel too calculated and superficial. It's hard to be seriously invested in whether Luca proves herself or gets her throne when people are being slaughtered and abused. This, I think, is the best part of this book. Clark tells a traditional ideological fantasy of learning to be a good ruler, but she puts it alongside a much deeper and more complex story of multi-faceted oppression. She has the two protagonists fall in love with each other and challenges them to understand each other, and Luca does not come off well in this comparison. Touraine is frustrated, impulsive, physical, and sometimes has catastrophically poor judgment. Luca is analytical and calculating, and in most ways understands the political dynamics far better than Touraine. We know how this story usually goes: Luca sees Touraine's brilliance and lifts her out of the ranks into a role of importance and influence, which Touraine should reward with loyalty. But Touraine's world is more real, more grounded, and more authentic, and both Touraine and the reader know what Luca could offer is contingent and comes with a higher price than Luca understands. (Incidentally, the cover of The Unbroken, designed by Lauren Panepinto with art by Tommy Arnold, is astonishingly good at capturing both Touraine's character and the overall feeling of the book. Here's a larger version.) The writing is good but uneven. Clark loves reversals, and they did keep me reading, but I think there were too many of them. By the end of the book, the escalation of betrayals and setbacks was more exhausting than exciting, and I'd stopped trusting anything good would last. (Admittedly, this is an accurate reflection of how Touraine felt.) Touraine's inner monologue also gets a bit repetitive when she's thrashing in the jaws of an emotional trap. I think some of this is first-novel problems of over-explaining emotional states and character reasoning, but these problems combine to make the book feel a bit over-long. I'm also not in love with the ending. It's perhaps the one place in the book where I am more cynical about the politics than Clark is, although she does lay the groundwork for it. But this book is also full of places small and large where it goes a different direction than most fantasy and is better for it. I think my favorite small moment is Touraine's quiet refusal to defend herself against certain insinuations. This is such a beautiful bit of characterization; she knows she won't be believed anyway, and refuses to demean herself by trying. I'm not sure I can recommend this book unconditionally, since I think you have to be in the mood for it, but it's one of the most thoughtful and nuanced looks at colonialism and rebellion I can remember seeing in fantasy. I found it frustrating in places, but I'm also still thinking about it. If you're looking for a political fantasy with teeth, you could do a lot worse, although expect to come out the other side a bit battered and bruised. Followed by The Faithless, and I have no idea where Clark is going to go with the second book. I suppose I'll have to read and find out. Content note: In addition to a lot of violence, gore, and death, including significant character death, there's also a major plague. If you're not feeling up to reading about panic caused by contageous illness, proceed with caution. Rating: 7 out of 10

8 December 2022

Shirish Agarwal: Wayland, Hearing aids, Multiverse & Identity

Wayland First up, I read Antoine Beaupr s Wayland to Sway migration with interest. While he said it s done and dusted or something similar, the post shows there s still quite a ways to go. I wouldn t say it s done or whatever till it s integrated so well that a person installs it and doesn t really need to fiddle with config files as an average user. For specific use-cases you may need to, but that should be outside of a normal user (layperson) experience. I have been using mate for a long long time and truth be told been very happy with it. The only thing I found about Wayland on mate is this discussion or rather this entry. The roadmap on Ubuntu Mate is also quite iffy. The Mate Wayland entry on Debian wiki also perhaps need an updation but dunno much as the latest update it shares is 2019 and it s 2022. One thing to note, at least according to Antoine, things should be better as and when it gets integrated even on legacy hardware. I would be interested to know how it would work on old desktops and laptops rather than new or is there some barrier? I, for one would have liked to see or know about why lightdm didn t work on Wayland and if there s support. From what little I know lightdm is much lighter than gdm3 and doesn t require much memory and from what little I have experienced works very well with mate. I have been using it since 2015/16 although the Debian changelog tells me that it has been present since 2011. I was hoping to see if there was a Wayland specific mailing list, something like debian-wayland but apparently there s not :(. Using mate desktop wayland (tried few other variations on the keywords) but search fails to find any meaningful answer :(. FWIW and I don t know the reason why but Archwiki never fails to amaze me. Interestingly, it just says No for mate. I probably would contact upstream in the coming days to know what their plans are and hopefully they will document what their plans are on integrating Wayland in both short-term and long-term with an update, or if there is something more recent they have documented elsewhere, get that update on the Debian wiki so people know. The other interesting thread I read was Russel Coker s Thinkpad X1 Carbon Gen5 entry. I would be in the market in a few months to find/buy a Thinkpad but probably of AMD rather than Intel because part of recent past history with Intel as well as AMD having a bit of an edge over Intel as far as graphics is concerned. I wonder why Russel was looking into Intel and not AMD. Would be interested to know why Intel and not AMD? Any specific reason ???

Hearing Aids I finally bought hearing aids about a couple of weeks back and have been practicing using them. I was able to have quite a few conversations although music is still I m not able to listen clearly but it is still a far cry from before and for the better. I am able to have conversations with people and also reply and they do not have to make that extra effort that they needed to. Make things easier for everybody. The one I bought is at the starting range although the hearing aids go all the way to 8 lakhs for a pair (INR 800,000), the more expensive ones having WiFi, Bluetooth and more channels, it all depends on how much can one afford. And AFAIK there is not a single Indian manufacturer who is known in this business.

One thing I did notice is while the hearing aids are remarkably sturdy if they fall down as they are small, yet you have to be careful of both dust and water . That does makes life a bit difficult as my house and city both gets sand quite a bit everyday. I don t think they made any India-specific changes, if they had, would probably make things better. I haven t yet looked at it, but it may be possible to hack it remotely. There may or may not be security issues involved, probably would try once I ve bit more time am bit more comfortable to try and see what I can find out. If I had bought it before, maybe I would have applied for the Debian event happening in Kerala, if nothing else, would have been to document what happened there in detail.  I probably would have to get a new motherboard for my desktop probably in a year or two as quite a few motherboards also have WiFi (WiFi 6 ?) think on the southbridge. I at least would have a look in new year and know more as to what s been happening. For last at least 2-3 years there has been a rumor which has been confirmed time and again that the Tata Group has been in talks with multiple vendors to set chip fabrication and testing business but to date they haven t been able to find one. They do keep on giving press conferences about the same but that s all they do :(. Just shared the latest one above.

The Long War Terry Pratchett, Stephen Braxter Long Earth Terry Pratchett, Stephen Braxter ISBN13: 9780062067777 Last month there was also a seconds books sale where I was lucky enough to get my hands on the Long War. But before I share about the book itself, I had a discussion with another of my friends and had to re-share part of that conversation. While the gentleman was adamant that non-fiction books are great, my point as always is both are equal. As I shared perhaps on this blog itself, perhaps multiple times, that I had seen a YT video in which a professor shared multiple textbooks of physics and shared how they are wrong and have been wrong and kept them in a specific corner. He took the latest book which he honestly said doesn t have any mistakes as far as he know and yet still kept in that same corner denoting that it is highly possible that future understanding will make the knowledge or understanding we know different. An example of physics in the nano world and how that is different and basically turns our understanding than what we know. Now as far as the book is concerned, remember Michael Crichton s Timeline. Now that book was originally written in the 1960 s while this one was written by both the honorable gentleman in 2013. So almost 50+ years difference between the two books, and that even shows how they think about things. In this book, you no longer need a big machine, but have something called a stepper machine which is say similar to a cellphone, that size and that frame, thickness etc. In this one, the idea of multiverse is also there but done a tad differently. In this, we do not have other humans or copy humans but have multiple earths that may have same or different geography as how evolution happened. None of the multiverse earths have humans but have different species depending on the evolution that happened there. There are something called as trolls but they have a much different meaning and way about them about how most fantasy authors portray trolls. While they are big in this as well, they are as gentle as bears or rabbits. So the whole thing is about real estate and how humans have spread out on multiple earths and the politics therein. Interestingly, the story was trashed or given negative reviews on Goodreads. The sad part is/was that it was written and published in 2013 when perhaps the possibility of war or anything like that was very remote especially in the States, but now we are now in 2022 and just had an insurrection happen and whole lot of Americans are radicalized, whether you see the left or the right depending on your ideology. An American did share few weeks ago how some shares are looking at Proportional Representation and that should make both parties come more towards the center and be a bit more transparent. What was interesting to me is the fact that states have much more rights to do elections and electioneering the way they want rather than a set model which everyone has common which is what happens in India. This also does poke holes into the whole Donald Trump stolen democracy drama but that s a different story altogether. One of the more interesting things I came to know about is that there are 4 books in the long series and this was the second book in itself. I do not want to dwell on the characters themselves as frankly speaking I haven t read all the four books and it would be gross injustice on my part to talk about the characters themselves. Did I enjoy reading the book, for sure. What was interesting and very true of human nature is that even if we have the ability or had the ability to have whole worlds to ourselves, we are bound to mess it up. And in that aspect, I don t think he is too far off the mark. If I had a whole world, wouldn t I try to exploit it to the best or worse of my ability. One of the more interesting topics in the book is the barter system they have thought of that is called as favors. If you are in multiple worlds, then having a currency, even fiat money is of no use and they have to find ways and means to trade with one another. The book also touches a bit on slavery but only just and doesn t really explore it as much as it could have.

Identity Now this has many meanings to it. Couple of weeks ago, saw a transgender meet. For the uninitiated or rather people like me, basically it is about people who are born in one gender but do not identify with it but the other and they express it first through their clothes and expression and the end of the journey perhaps is with having the organs but this may or may not be feasible, as such surgery is expensive and also not available everywhere. After section 377 was repealed few years ago, we do have a third gender on forms as well as have something called a Transgender Act but how much the needle has moved in society is still a question. They were doing a roadshow near my house hence I was able to talk with them with my new hearing aids and while there was lot of traffic was able to understand some of their issues. For e.g. they find it difficult to get houses on rent, but then it is similar for bachelor guys or girls also. One could argue to what degree it is, and that perhaps maybe. Also, there is a myth that they are somehow promiscuous but that I believe is neither here or there. Osho said an average person thinks about the opposite sex every few seconds or a minute. I am sure even Freud would have similar ideas. So, if you look in that way everybody is promiscuous as far as thought is concerned. The other part being opportunity but that again is function of so many other things. Some people are able to attract a lot of people, others might not. And then whether they chose to act on that opportunity or not is another thing altogether. Another word that is or was used is called gender fluid, but that too is iffy as gender fluid may or may not mean transgender. Also, while watching some nature documentary few days/weeks back had come to know that trees have something like 18 odd genders. That just blows me out of the mind and does re-question this whole idea of sexuality and identity to only two which seems somewhat regressive at least to me. If we think humans are part of nature, then we need to be open up perhaps a bit more. But identity as I shared above has more than one meaning. For e.g. citizenship, that one is born in India is even messier to know, understand and define. I had come across this article about couple of months back. Now think about this. Now, there have been studies and surveys about citizenship and it says something like 60% birth registrations are done in metro cities. Now Metro cities are 10 as defined by Indian state. But there are roughly an odd 4k cities in India and probably twice the number of villages and those are conservative numbers as we still don t record things meticulously, maybe due to the Indian oral tradition or just being lazy or both, one part is also that if you document people and villages and towns, then you are also obligated to give them some things as a state and that perhaps is not what the Indian state wants. A small village in India could be anywhere from few hundreds of people to a few thousand. And all the new interventions whether it is PAN, Aadhar has just made holes rather than making things better. They are not inclusive but exclusive. And none of this takes into account Indian character and the way things are done in India. In most households, excluding the celebs (they are in a world of pain altogether when it comes to baby names but then it s big business but that s an entire different saga altogether, so not going to touch that.) I would use or say my individual case as that is and seems to be something which is regular even today. I was given a nickname when I was 3 years old and given a name when I was 5-6 when I was put in school. I also came to know in school few kids who didn t like their names and couple of them cajoled and actually changed their names while they were kids, most of us just stayed with what we got. I do remember sharing about nakushi or something similar a name given to few girls in Maharashtra by their parents and the state intervened and changed their names. But that too is another story in itself. What I find most problematic is that the state seems to be blind, and this seems to be by design rather than a mistake. Couple of years back, Assam did something called NRC (National Register of Citizens) and by the Govt s own account it was a failure of massive proportions. And they still want to bring in CAA, screwing up Assam more. And this is the same Govt. went shown how incorrect it was, blamed it all on the High Court and it s the same Govt. that shopped around for judges to put somebody called Mr. Saibaba (an invalid 90 year adivasi) against whom the Govt. hasn t even a single proof as of date. Apparently, they went to 6 judges who couldn t give what the decision the Govt. wanted. All this info. is in public domain. So the current party ruling, i.e. BJP just wants to make more divisions rather than taking people along as they don t have answers either on economy, inflation or issues that people are facing. One bright light has been Rahul Gandhi who has been doing a padhyatra (walking) from Kanyakumari to Kashmir and has had tremendous success although mainstream media has showed almost nothing what he is doing or why he is doing that. Not only he had people following him, there are and were many who took his example and using the same values of inclusiveness are walking where they can. And this is not to do with just a political party but more with a political thought of inclusiveness, that we are one irrespective of what I believe, eat, wear etc. And that gentleman has been giving press conferences while our dear P.M. even after 8 years doesn t have the guts to do a single press conference. Before closing, I do want to take another aspect, Rahul Gandhi s mother is an Italian or was from Italy before she married. But for BJP she is still Italian. Rishi Sunak, who has become the UK Prime Minister they think of him as Indian and yet he has sworn using the Queen s name. And the same goes for Canada Kumar (Akshay Kumar) and many others. How the right is able to blind and deaf to what it thinks is beyond me. All these people have taken an oath in the name of the Queen and they have to be loyal to her or rather now King Charles III. The disconnect continues.

Russell Coker: Thinkpad X1 Carbon Gen5

Gen1 Since February 2018 I have been using a Thinkpad X1 Carbon Gen1 [1] as my main laptop. Generally I ve been very happy with it, it s small and light, has good performance for web browsing etc, and with my transition to doing all compiles etc on servers it works well. When I wrote my original review I was unhappy with the keyboard, but I got used to that and found it to be reasonably good. The things that I have found as limits on it are the display resolution as 1600*900 isn t that great by modern standards (most phones are a lot higher resolution), the size (slightly too large for the pocket of my Scott e Vest [2] jacket), and the lack of USB-C. Modern laptops can charge via USB-C/Thunderbolt while also doing USB and DisplayPort video over the same cable. USB-C monitors which support charging a laptop over the same cable as used for video input are becoming common (last time I checked the Dell web site for many models of monitor there was a USB-C one that cost about $100 more). I work at a company with lots of USB-C monitors and docks so being able to use my personal laptop with the same displays when on breaks is really handy. A final problem with the Gen1 is that it has a proprietary and unusual connector for the SSD which means that a replacement SSD costs about what I paid for the entire laptop. Ever since the SSD gave a BTRFS checksum error I ve been thinking of replacing it. Choosing a Replacement The Gen5 is the first Thinkpad X1 Carbon to have USB-C. For work I had used a Gen6 which was quite nice [3]. But it didn t seem to offer much over the Gen5. So I started looking for cheap Thinkpad X1 Carbons of Gen5+. A Cheap? Gen5 In July I saw an ebay advert for a Gen5 with FullHD display for $370 or nearest offer, with the downside being that the BIOS password had been lost. I offered $330 and the seller accepted, in retrospect that was unusually cheap and should have been a clue that I needed to do further investigation. It turned out that resetting the BIOS password is unusually difficult as it s in the TPM so the system would only boot Windows. When I learned that I should have sold the laptop to someone who wanted to run Windows and bought another. Instead I followed some instructions on the Internet about entering a wrong password multiple times to get to a password recovery screen, instead the machine locked up entirely and became unusable for windows (so don t do that). Then I looked for ways of fixing the motherboard. The cheapest was $75.25 for a replacement BIOS flash chip that had a BIOS that didn t check the validity of passwords. The aim was to solder that on, set a new password (with any random text being accepted as the old password), then solder the old one back on for normal functionality. It turned out that I m not good at fine soldering, after I had hacked at it a friend diagnosed the chip and motherboard to probably both be damaged (he couldn t get it going). The end solution was that my friend found a replacement motherboard for $170 from China. This gave a total cost of $575.25 for the laptop which is more than the usual price of a Gen6 and more than I expected to pay. In the past when advocating buying second hand or refurbished laptops people would say what happens if you get one that doesn t work properly , the answer to that question is that I paid a lot less than the new cost of $2700+ for a Thinkpad X1 Carbon and got a computer that does everything I need. One of the advantages of getting a cheap laptop is that I won t be so unhappy if I happen to drop it. A Cheap Gen6 After the failed experiment with a replacement BIOS on the Gen5 I was considering selling it for scrap. So I bought a Gen6 from Australian Computer Traders via Amazon for $390 in August. The advert clearly stated that it was for a laptop with USB-C and Thunderbolt (Gen5+ features) but they shipped me a Gen4 that didn t even have USB-C. They eventually refunded me but I will try to avoid buying from them again. Finally Working The laptop I now have has a i5-6300U CPU that rates 3242 on cpubenchmark.net. My Gen1 thinkpad has a i7-3667U CPU that rates 2378 on cpubenchmark.net, note that the cpubenchmark.net people have rescaled their benchmark since my review of the Gen1 in 2018. So according to the benchmarks my latest laptop is about 36% faster for CPU operations. Not much of a difference when comparing systems manufactured in 2012 and 2017! According to the benchmarks a medium to high end recent CPU will be more than 10* faster than the one in my Gen5 laptop, but such a CPU would cost more than my laptop cost. The storage is a 256G NVMe device that can do sustained reads at 900MB/s, that s not even twice as fast as the SSD in my Gen1 laptop although NVMe is designed to perform better for small IO. It has 2*USB-C ports both of which can be used for charging, which is a significant benefit over the Gen6 I had for work in 2018 which only had one. I don t know why Lenovo made Gen6 machines that were lesser than Gen5 in such an important way. It can power my Desklab portable 4K monitor [4] but won t send a DisplayPort signal over the same USB-C cable. I don t know if this is a USB-C cable issue or some problem with the laptop recognising displays. It works nicely with Dell USB-C monitors and docks that power the laptop over the same cable as used for DisplayPort. Also the HDMI port works with 4K monitors, so at worst I could connect my Desklab monitor via a USB-C cable for power and HDMI for data. The inability to change the battery without disassembly is still a problem, but hopefully USB-C connected batteries capable of charging such a laptop will become affordable in the near future and I have had some practice at disassembling this laptop. It still has the Ethernet dongle annoyance, and of course the seller didn t include that. But USB ethernet devices are quite good and I have a few of them. In conclusion it s worth the $575.25 I paid for it and would have been even better value for money if I had been a bit smarter when buying. It meets the initial criteria of USB-C power and display and of fitting in my jacket pocket as well as being slightly better than my old laptop in every other way.

19 November 2022

Joerg Jaspert: From QNAP QTS to TrueNAS Scale

History, Setup So for quite some time I have a QNAP TS-873x here, equipped with 8 Western Digital Red 10 TB disks, plus 2 WD Blue 500G M2 SSDs. The QNAP itself has an AMD Embedded R-Series RX-421MD with 4 cores and was equipped with 48G RAM. Initially I had been quite happy, the system is nice. It was fast, it was easy to get to run and the setup of things I wanted was simple enough. All in a web interface that tries to imitate a kind of workstation feeling and also tries to hide that it is actually a webinterface. Natually with that amount of disks I had a RAID6 for the disks, plus RAID1 for the SSDs. And then configured as a big storage pool with the RAID1 as cache. Below the hood QNAP uses MDADM Raid and LVM (if you want, with thin provisioning), in some form of emdedded linux. The interface allows for regular snapshots of your storage with flexible enough schedules to create them, so it all appears pretty good.

QNAP slow Fast forward some time and it gets annoying. First off you really should have regular raid resyncs scheduled, and while you can set priorities on them and have them low priority, they make the whole system feel very sluggish, quite annoying. And sure, power failure (rare, but can happen) means another full resync run. Also, it appears all of the snapshots are always mounted to some /mnt/snapshot/something place (df on the system gets quite unusable). Second, the reboot times. QNAP seems to be affected by the more features, fuck performance virus, and bloat their OS with more and more features while completly ignoring the performance. Everytime they do an upgrade it feels worse. Lately reboot times went up to 10 to 15 minutes - and then it still hadn t started the virtual machines / docker containers one might run on. Another 5 to 10 minutes for those. Opening the file explorer - ages on calculating what to show. Trying to get the storage setup shown? Go get a coffee, but please fetch the beans directly from the plantation, or you are too fast. Annoying it was. And no, no broken disks or fan or anything, it all checks out fine.

Replace QNAPs QTS system So I started looking around what to do. More RAM may help a little bit, but I already had 48G, the system itself appears to only do 64G maximum, so not much chance of it helping enough. Hardware is all fine and working, so software needs to be changed. Sounds hard, but turns out, it is not.

TrueNAS And I found that multiple people replaced the QNAPs own system with a TrueNAS installation and generally had been happy. Looking further I found that TrueNAS has a variant called Scale - which is based on Debian. Doubly good, that, so I went off checking what I may need for it.

Requirements Heck, that was a step back. To install TrueNAS you need an HDMI out and a disk to put it on. The one that QTS uses is too small, so no option.
QNAPs  internal USB disk QNAPs original internal USB drive, DOM
So either use one of the SSDs that played cache (and should do so again in TrueNAS, or get the QNAP original replaced. HDMI out is simple, get a cheap card and put it into one of the two PCIe-4x slots, done. The disk thing looked more complicated, as QNAP uses some internal usb stick thing . Turns out it is just a USB stick that has an 8+1pin connector. Couldn t find anything nice as replacement, but hey, there are 9-pin to USB-A adapters.
9PIN to USB A a 9pin to USB A adapter
With that adapter, one can take some random M2 SSD and an M2-to-USB case, plus some cabling, and voila, we have a nice system disk.
USB 9pin  to USB-A cable connected to Motherboard and some more cable 9pin adapter to USB-A connected with some more cable
Obviously there isn t a good place to put this SSD case and cable, but the QNAP case is large enough to find space and use some cable ties to store it safely. Space enough to get the cable from the side, where the mainboard is to the place I mounted it, so all fine.
Mounted  SSD in external case, also shows the video card Mounted SSD in its external case
The next best M2 SSD was a Western Digital Red with 500G - and while this is WAY too much for TrueNAS, it works. And hey, only using a tiny fraction? Oh so much more cells available internally to use when others break. Or something Together with the Asus card mounted I was able to install TrueNAS. Which is simple, their installer is easy enough to follow, just make sure to select the right disk to put it on.

Preserving data during the move Switching from QNAP QTS to TrueNAS Scale means changing from MDADM Raid with LVM and ext4 on top to ZFS and as such all data on it gets erased. So a backup first is helpful, and I got myself two external Seagate USB Disks of 6TB each - enough for the data I wanted to keep. Copying things all over took ages, especially as the QNAP backup thingie sucks, it was breaking quite often. Also, for some reason I did not investigate, the performance of it was real bad. It started at a maximum of 50MB/s, but the last terabyte of data was copied at MUCH less than that, and so it took much longer than I anticipated. Copying back was slow too, but much less so. Of course reading things usually is faster than writing, with it going around 100MB/s most of the time, which is quite a bit more - still not what USB3 can actually do, but I guess the AMD chip doesn t want to go that fast.

TrueNAS experience The installation went mostly smooth, the only real trouble had been on my side. Turns out that a bad network cable does NOT help the network setup, who would have thought. Other than that it is the usual set of questions you would expect, a reboot, and then some webinterface. And here the differences start. The whole system boots up much faster. Not even a third of the time compared to QTS. One important thing: As TrueNAS scale is Debian based, and hence a linux kernel, it automatically detects and assembles the old RAID arrays that QTS put on. Which TrueNAS can do nothing with, so it helps to manually stop them and wipe the disks. Afterwards I put ZFS on the disks, with a similar setup to what I had before. The spinning rust are the data disks in a RAIDZ2 setup, the two SSDs are added as cache devices. Unlike MDADM, ZFS does not have a long sync process. Also unlike the MDADM/LVM/EXT4 setup from before, ZFS works different. It manages the raid thing but it also does the volume and filesystem parts. Quite different handling, and I m still getting used to it, so no, I won t write some ZFS introduction now.

Features The two systems can not be compared completly, they are having a pretty different target audience. QNAP is more for the user that wants some network storage that offers a ton of extra features easily available via a clickable interface. While TrueNAS appears more oriented to people that want a fast but reliable storage system. TrueNAS does not offer all the extra bloat the QNAP delivers. Still, you have the ability to run virtual machines and it seems it comes with Rancher, so some kubernetes/container ability is there. It lacks essential features like assigning PCI devices to virtual machines, so is not useful right now, but I assume that will come in a future version. I am still exploring it all, but I like what I have right now. Still rebuilding my setup to have all shares exported and used again, but the most important are working already.

16 November 2022

Antoine Beaupr : A ZFS migration

In my tubman setup, I started using ZFS on an old server I had lying around. The machine is really old though (2011!) and it "feels" pretty slow. I want to see how much of that is ZFS and how much is the machine. Synthetic benchmarks show that ZFS may be slower than mdadm in RAID-10 or RAID-6 configuration, so I want to confirm that on a live workload: my workstation. Plus, I want easy, regular, high performance backups (with send/receive snapshots) and there's no way I'm going to use BTRFS because I find it too confusing and unreliable. So off we go.

Installation Since this is a conversion (and not a new install), our procedure is slightly different than the official documentation but otherwise it's pretty much in the same spirit: we're going to use ZFS for everything, including the root filesystem. So, install the required packages, on the current system:
apt install --yes gdisk zfs-dkms zfs zfs-initramfs zfsutils-linux
We also tell DKMS that we need to rebuild the initrd when upgrading:
echo REMAKE_INITRD=yes > /etc/dkms/zfs.conf

Partitioning This is going to partition /dev/sdc with:
  • 1MB MBR / BIOS legacy boot
  • 512MB EFI boot
  • 1GB bpool, unencrypted pool for /boot
  • rest of the disk for zpool, the rest of the data
     sgdisk --zap-all /dev/sdc
     sgdisk -a1 -n1:24K:+1000K -t1:EF02 /dev/sdc
     sgdisk     -n2:1M:+512M   -t2:EF00 /dev/sdc
     sgdisk     -n3:0:+1G      -t3:BF01 /dev/sdc
     sgdisk     -n4:0:0        -t4:BF00 /dev/sdc
    
That will look something like this:
    root@curie:/home/anarcat# sgdisk -p /dev/sdc
    Disk /dev/sdc: 1953525168 sectors, 931.5 GiB
    Model: ESD-S1C         
    Sector size (logical/physical): 512/512 bytes
    Disk identifier (GUID): [REDACTED]
    Partition table holds up to 128 entries
    Main partition table begins at sector 2 and ends at sector 33
    First usable sector is 34, last usable sector is 1953525134
    Partitions will be aligned on 16-sector boundaries
    Total free space is 14 sectors (7.0 KiB)
    Number  Start (sector)    End (sector)  Size       Code  Name
       1              48            2047   1000.0 KiB  EF02  
       2            2048         1050623   512.0 MiB   EF00  
       3         1050624         3147775   1024.0 MiB  BF01  
       4         3147776      1953525134   930.0 GiB   BF00
Unfortunately, we can't be sure of the sector size here, because the USB controller is probably lying to us about it. Normally, this smartctl command should tell us the sector size as well:
root@curie:~# smartctl -i /dev/sdb -qnoserial
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-14-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Black Mobile
Device Model:     WDC WD10JPLX-00MBPT0
Firmware Version: 01.01H01
User Capacity:    1 000 204 886 016 bytes [1,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue May 17 13:33:04 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Above is the example of the builtin HDD drive. But the SSD device enclosed in that USB controller doesn't support SMART commands, so we can't trust that it really has 512 bytes sectors. This matters because we need to tweak the ashift value correctly. We're going to go ahead the SSD drive has the common 4KB settings, which means ashift=12. Note here that we are not creating a separate partition for swap. Swap on ZFS volumes (AKA "swap on ZVOL") can trigger lockups and that issue is still not fixed upstream. Ubuntu recommends using a separate partition for swap instead. But since this is "just" a workstation, we're betting that we will not suffer from this problem, after hearing a report from another Debian developer running this setup on their workstation successfully. We do not recommend this setup though. In fact, if I were to redo this partition scheme, I would probably use LUKS encryption and setup a dedicated swap partition, as I had problems with ZFS encryption as well.

Creating pools ZFS pools are somewhat like "volume groups" if you are familiar with LVM, except they obviously also do things like RAID-10. (Even though LVM can technically also do RAID, people typically use mdadm instead.) In any case, the guide suggests creating two different pools here: one, in cleartext, for boot, and a separate, encrypted one, for the rest. Technically, the boot partition is required because the Grub bootloader only supports readonly ZFS pools, from what I understand. But I'm a little out of my depth here and just following the guide.

Boot pool creation This creates the boot pool in readonly mode with features that grub supports:
    zpool create \
        -o cachefile=/etc/zfs/zpool.cache \
        -o ashift=12 -d \
        -o feature@async_destroy=enabled \
        -o feature@bookmarks=enabled \
        -o feature@embedded_data=enabled \
        -o feature@empty_bpobj=enabled \
        -o feature@enabled_txg=enabled \
        -o feature@extensible_dataset=enabled \
        -o feature@filesystem_limits=enabled \
        -o feature@hole_birth=enabled \
        -o feature@large_blocks=enabled \
        -o feature@lz4_compress=enabled \
        -o feature@spacemap_histogram=enabled \
        -o feature@zpool_checkpoint=enabled \
        -O acltype=posixacl -O canmount=off \
        -O compression=lz4 \
        -O devices=off -O normalization=formD -O relatime=on -O xattr=sa \
        -O mountpoint=/boot -R /mnt \
        bpool /dev/sdc3
I haven't investigated all those settings and just trust the upstream guide on the above.

Main pool creation This is a more typical pool creation.
    zpool create \
        -o ashift=12 \
        -O encryption=on -O keylocation=prompt -O keyformat=passphrase \
        -O acltype=posixacl -O xattr=sa -O dnodesize=auto \
        -O compression=zstd \
        -O relatime=on \
        -O canmount=off \
        -O mountpoint=/ -R /mnt \
        rpool /dev/sdc4
Breaking this down:
  • -o ashift=12: mentioned above, 4k sector size
  • -O encryption=on -O keylocation=prompt -O keyformat=passphrase: encryption, prompt for a password, default algorithm is aes-256-gcm, explicit in the guide, made implicit here
  • -O acltype=posixacl -O xattr=sa: enable ACLs, with better performance (not enabled by default)
  • -O dnodesize=auto: related to extended attributes, less compatibility with other implementations
  • -O compression=zstd: enable zstd compression, can be disabled/enabled by dataset to with zfs set compression=off rpool/example
  • -O relatime=on: classic atime optimisation, another that could be used on a busy server is atime=off
  • -O canmount=off: do not make the pool mount automatically with mount -a?
  • -O mountpoint=/ -R /mnt: mount pool on / in the future, but /mnt for now
Those settings are all available in zfsprops(8). Other flags are defined in zpool-create(8). The reasoning behind them is also explained in the upstream guide and some also in [the Debian wiki][]. Those flags were actually not used:
  • -O normalization=formD: normalize file names on comparisons (not storage), implies utf8only=on, which is a bad idea (and effectively meant my first sync failed to copy some files, including this folder from a supysonic checkout). and this cannot be changed after the filesystem is created. bad, bad, bad.
[the Debian wiki]: https://wiki.debian.org/ZFS#Advanced_Topics

Side note about single-disk pools Also note that we're living dangerously here: single-disk ZFS pools are rumoured to be more dangerous than not running ZFS at all. The choice quote from this article is:
[...] any error can be detected, but cannot be corrected. This sounds like an acceptable compromise, but its actually not. The reason its not is that ZFS' metadata cannot be allowed to be corrupted. If it is it is likely the zpool will be impossible to mount (and will probably crash the system once the corruption is found). So a couple of bad sectors in the right place will mean that all data on the zpool will be lost. Not some, all. Also there's no ZFS recovery tools, so you cannot recover any data on the drives.
Compared with (say) ext4, where a single disk error can recovered, this is pretty bad. But we are ready to live with this with the idea that we'll have hourly offline snapshots that we can easily recover from. It's trade-off. Also, we're running this on a NVMe/M.2 drive which typically just blinks out of existence completely, and doesn't "bit rot" the way a HDD would. Also, the FreeBSD handbook quick start doesn't have any warnings about their first example, which is with a single disk. So I am reassured at least.

Creating mount points Next we create the actual filesystems, known as "datasets" which are the things that get mounted on mountpoint and hold the actual files.
  • this creates two containers, for ROOT and BOOT
     zfs create -o canmount=off -o mountpoint=none rpool/ROOT &&
     zfs create -o canmount=off -o mountpoint=none bpool/BOOT
    
    Note that it's unclear to me why those datasets are necessary, but they seem common practice, also used in this FreeBSD example. The OpenZFS guide mentions the Solaris upgrades and Ubuntu's zsys that use that container for upgrades and rollbacks. This blog post seems to explain a bit the layout behind the installer.
  • this creates the actual boot and root filesystems:
     zfs create -o canmount=noauto -o mountpoint=/ rpool/ROOT/debian &&
     zfs mount rpool/ROOT/debian &&
     zfs create -o mountpoint=/boot bpool/BOOT/debian
    
    I guess the debian name here is because we could technically have multiple operating systems with the same underlying datasets.
  • then the main datasets:
     zfs create                                 rpool/home &&
     zfs create -o mountpoint=/root             rpool/home/root &&
     chmod 700 /mnt/root &&
     zfs create                                 rpool/var
    
  • exclude temporary files from snapshots:
     zfs create -o com.sun:auto-snapshot=false  rpool/var/cache &&
     zfs create -o com.sun:auto-snapshot=false  rpool/var/tmp &&
     chmod 1777 /mnt/var/tmp
    
  • and skip automatic snapshots in Docker:
     zfs create -o canmount=off                 rpool/var/lib &&
     zfs create -o com.sun:auto-snapshot=false  rpool/var/lib/docker
    
    Notice here a peculiarity: we must create rpool/var/lib to create rpool/var/lib/docker otherwise we get this error:
     cannot create 'rpool/var/lib/docker': parent does not exist
    
    ... and no, just creating /mnt/var/lib doesn't fix that problem. In fact, it makes things even more confusing because an existing directory shadows a mountpoint, which is the opposite of how things normally work. Also note that you will probably need to change storage driver in Docker, see the zfs-driver documentation for details but, basically, I did:
    echo '  "storage-driver": "zfs"  ' > /etc/docker/daemon.json
    
    Note that podman has the same problem (and similar solution):
    printf '[storage]\ndriver = "zfs"\n' > /etc/containers/storage.conf
    
  • make a tmpfs for /run:
     mkdir /mnt/run &&
     mount -t tmpfs tmpfs /mnt/run &&
     mkdir /mnt/run/lock
    
We don't create a /srv, as that's the HDD stuff. Also mount the EFI partition:
mkfs.fat -F 32 /dev/sdc2 &&
mount /dev/sdc2 /mnt/boot/efi/
At this point, everything should be mounted in /mnt. It should look like this:
root@curie:~# LANG=C df -h -t zfs -t vfat
Filesystem            Size  Used Avail Use% Mounted on
rpool/ROOT/debian     899G  384K  899G   1% /mnt
bpool/BOOT/debian     832M  123M  709M  15% /mnt/boot
rpool/home            899G  256K  899G   1% /mnt/home
rpool/home/root       899G  256K  899G   1% /mnt/root
rpool/var             899G  384K  899G   1% /mnt/var
rpool/var/cache       899G  256K  899G   1% /mnt/var/cache
rpool/var/tmp         899G  256K  899G   1% /mnt/var/tmp
rpool/var/lib/docker  899G  256K  899G   1% /mnt/var/lib/docker
/dev/sdc2             511M  4.0K  511M   1% /mnt/boot/efi
Now that we have everything setup and mounted, let's copy all files over.

Copying files This is a list of all the mounted filesystems
for fs in /boot/ /boot/efi/ / /home/; do
    echo "syncing $fs to /mnt$fs..." && 
    rsync -aSHAXx --info=progress2 --delete $fs /mnt$fs
done
You can check that the list is correct with:
mount -l -t ext4,btrfs,vfat   awk ' print $3 '
Note that we skip /srv as it's on a different disk. On the first run, we had:
root@curie:~# for fs in /boot/ /boot/efi/ / /home/; do
        echo "syncing $fs to /mnt$fs..." && 
        rsync -aSHAXx --info=progress2 $fs /mnt$fs
    done
syncing /boot/ to /mnt/boot/...
              0   0%    0.00kB/s    0:00:00 (xfr#0, to-chk=0/299)  
syncing /boot/efi/ to /mnt/boot/efi/...
     16,831,437 100%  184.14MB/s    0:00:00 (xfr#101, to-chk=0/110)
syncing / to /mnt/...
 28,019,293,280  94%   47.63MB/s    0:09:21 (xfr#703710, ir-chk=6748/839220)rsync: [generator] delete_file: rmdir(var/lib/docker) failed: Device or resource busy (16)
could not make way for new symlink: var/lib/docker
 34,081,267,990  98%   50.71MB/s    0:10:40 (xfr#736577, to-chk=0/867732)    
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
syncing /home/ to /mnt/home/...
rsync: [sender] readlink_stat("/home/anarcat/.fuse") failed: Permission denied (13)
 24,456,268,098  98%   68.03MB/s    0:05:42 (xfr#159867, ir-chk=6875/172377) 
file has vanished: "/home/anarcat/.cache/mozilla/firefox/s2hwvqbu.quantum/cache2/entries/B3AB0CDA9C4454B3C1197E5A22669DF8EE849D90"
199,762,528,125  93%   74.82MB/s    0:42:26 (xfr#1437846, ir-chk=1018/1983979)rsync: [generator] recv_generator: mkdir "/mnt/home/anarcat/dist/supysonic/tests/assets/\#346" failed: Invalid or incomplete multibyte or wide character (84)
*** Skipping any contents from this failed directory ***
315,384,723,978  96%   76.82MB/s    1:05:15 (xfr#2256473, to-chk=0/2993950)    
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
Note the failure to transfer that supysonic file? It turns out they had a weird filename in their source tree, since then removed, but still it showed how the utf8only feature might not be such a bad idea. At this point, the procedure was restarted all the way back to "Creating pools", after unmounting all ZFS filesystems (umount /mnt/run /mnt/boot/efi && umount -t zfs -a) and destroying the pool, which, surprisingly, doesn't require any confirmation (zpool destroy rpool). The second run was cleaner:
root@curie:~# for fs in /boot/ /boot/efi/ / /home/; do
        echo "syncing $fs to /mnt$fs..." && 
        rsync -aSHAXx --info=progress2 --delete $fs /mnt$fs
    done
syncing /boot/ to /mnt/boot/...
              0   0%    0.00kB/s    0:00:00 (xfr#0, to-chk=0/299)  
syncing /boot/efi/ to /mnt/boot/efi/...
              0   0%    0.00kB/s    0:00:00 (xfr#0, to-chk=0/110)  
syncing / to /mnt/...
 28,019,033,070  97%   42.03MB/s    0:10:35 (xfr#703671, ir-chk=1093/833515)rsync: [generator] delete_file: rmdir(var/lib/docker) failed: Device or resource busy (16)
could not make way for new symlink: var/lib/docker
 34,081,807,102  98%   44.84MB/s    0:12:04 (xfr#736580, to-chk=0/867723)    
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
syncing /home/ to /mnt/home/...
rsync: [sender] readlink_stat("/home/anarcat/.fuse") failed: Permission denied (13)
IO error encountered -- skipping file deletion
 24,043,086,450  96%   62.03MB/s    0:06:09 (xfr#151819, ir-chk=15117/172571)
file has vanished: "/home/anarcat/.cache/mozilla/firefox/s2hwvqbu.quantum/cache2/entries/4C1FDBFEA976FF924D062FB990B24B897A77B84B"
315,423,626,507  96%   67.09MB/s    1:14:43 (xfr#2256845, to-chk=0/2994364)    
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
Also note the transfer speed: we seem capped at 76MB/s, or 608Mbit/s. This is not as fast as I was expecting: the USB connection seems to be at around 5Gbps:
anarcat@curie:~$ lsusb -tv   head -4
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
    ID 1d6b:0003 Linux Foundation 3.0 root hub
     __ Port 1: Dev 4, If 0, Class=Mass Storage, Driver=uas, 5000M
        ID 0b05:1932 ASUSTek Computer, Inc.
So it shouldn't cap at that speed. It's possible the USB adapter is failing to give me the full speed though. It's not the M.2 SSD drive either, as that has a ~500MB/s bandwidth, acccording to its spec. At this point, we're about ready to do the final configuration. We drop to single user mode and do the rest of the procedure. That used to be shutdown now, but it seems like the systemd switch broke that, so now you can reboot into grub and pick the "recovery" option. Alternatively, you might try systemctl rescue, as I found out. I also wanted to copy the drive over to another new NVMe drive, but that failed: it looks like the USB controller I have doesn't work with older, non-NVME drives.

Boot configuration Now we need to enter the new system to rebuild the boot loader and initrd and so on. First, we bind mounts and chroot into the ZFS disk:
mount --rbind /dev  /mnt/dev &&
mount --rbind /proc /mnt/proc &&
mount --rbind /sys  /mnt/sys &&
chroot /mnt /bin/bash
Next we add an extra service that imports the bpool on boot, to make sure it survives a zpool.cache destruction:
cat > /etc/systemd/system/zfs-import-bpool.service <<EOF
[Unit]
DefaultDependencies=no
Before=zfs-import-scan.service
Before=zfs-import-cache.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zpool import -N -o cachefile=none bpool
# Work-around to preserve zpool cache:
ExecStartPre=-/bin/mv /etc/zfs/zpool.cache /etc/zfs/preboot_zpool.cache
ExecStartPost=-/bin/mv /etc/zfs/preboot_zpool.cache /etc/zfs/zpool.cache
[Install]
WantedBy=zfs-import.target
EOF
Enable the service:
systemctl enable zfs-import-bpool.service
I had to trim down /etc/fstab and /etc/crypttab to only contain references to the legacy filesystems (/srv is still BTRFS!). If we don't already have a tmpfs defined in /etc/fstab:
ln -s /usr/share/systemd/tmp.mount /etc/systemd/system/ &&
systemctl enable tmp.mount
Rebuild boot loader with support for ZFS, but also to workaround GRUB's missing zpool-features support:
grub-probe /boot   grep -q zfs &&
update-initramfs -c -k all &&
sed -i 's,GRUB_CMDLINE_LINUX.*,GRUB_CMDLINE_LINUX="root=ZFS=rpool/ROOT/debian",' /etc/default/grub &&
update-grub
For good measure, make sure the right disk is configured here, for example you might want to tag both drives in a RAID array:
dpkg-reconfigure grub-pc
Install grub to EFI while you're there:
grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=debian --recheck --no-floppy
Filesystem mount ordering. The rationale here in the OpenZFS guide is a little strange, but I don't dare ignore that.
mkdir /etc/zfs/zfs-list.cache
touch /etc/zfs/zfs-list.cache/bpool
touch /etc/zfs/zfs-list.cache/rpool
zed -F &
Verify that zed updated the cache by making sure these are not empty:
cat /etc/zfs/zfs-list.cache/bpool
cat /etc/zfs/zfs-list.cache/rpool
Once the files have data, stop zed:
fg
Press Ctrl-C.
Fix the paths to eliminate /mnt:
sed -Ei "s /mnt/? / " /etc/zfs/zfs-list.cache/*
Snapshot initial install:
zfs snapshot bpool/BOOT/debian@install
zfs snapshot rpool/ROOT/debian@install
Exit chroot:
exit

Finalizing One last sync was done in rescue mode:
for fs in /boot/ /boot/efi/ / /home/; do
    echo "syncing $fs to /mnt$fs..." && 
    rsync -aSHAXx --info=progress2 --delete $fs /mnt$fs
done
Then we unmount all filesystems:
mount   grep -v zfs   tac   awk '/\/mnt/  print $3 '   xargs -i  umount -lf  
zpool export -a
Reboot, swap the drives, and boot in ZFS. Hurray!

Benchmarks This is a test that was ran in single-user mode using fio and the Ars Technica recommended tests, which are:
  • Single 4KiB random write process:
     fio --name=randwrite4k1x --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
    
  • 16 parallel 64KiB random write processes:
     fio --name=randwrite64k16x --ioengine=posixaio --rw=randwrite --bs=64k --size=256m --numjobs=16 --iodepth=16 --runtime=60 --time_based --end_fsync=1
    
  • Single 1MiB random write process:
     fio --name=randwrite1m1x --ioengine=posixaio --rw=randwrite --bs=1m --size=16g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
    
Strangely, that's not exactly what the author, Jim Salter, did in his actual test bench used in the ZFS benchmarking article. The first thing is there's no read test at all, which is already pretty strange. But also it doesn't include stuff like dropping caches or repeating results. So here's my variation, which i called fio-ars-bench.sh for now. It just batches a bunch of fio tests, one by one, 60 seconds each. It should take about 12 minutes to run, as there are 3 pair of tests, read/write, with and without async. My bias, before building, running and analysing those results is that ZFS should outperform the traditional stack on writes, but possibly not on reads. It's also possible it outperforms it on both, because it's a newer drive. A new test might be possible with a new external USB drive as well, although I doubt I will find the time to do this.

Results All tests were done on WD blue SN550 drives, which claims to be able to push 2400MB/s read and 1750MB/s write. An extra drive was bought to move the LVM setup from a WDC WDS500G1B0B-00AS40 SSD, a WD blue M.2 2280 SSD that was at least 5 years old, spec'd at 560MB/s read, 530MB/s write. Benchmarks were done on the M.2 SSD drive but discarded so that the drive difference is not a factor in the test. In practice, I'm going to assume we'll never reach those numbers because we're not actually NVMe (this is an old workstation!) so the bottleneck isn't the disk itself. For our purposes, it might still give us useful results.

Rescue test, LUKS/LVM/ext4 Those tests were performed with everything shutdown, after either entering the system in rescue mode, or by reaching that target with:
systemctl rescue
The network might have been started before or after the test as well:
systemctl start systemd-networkd
So it should be fairly reliable as basically nothing else is running. Raw numbers, from the ?job-curie-lvm.log, converted to MiB/s and manually merged:
test read I/O read IOPS write I/O write IOPS
rand4k4g1x 39.27 10052 212.15 54310
rand4k4g1x--fsync=1 39.29 10057 2.73 699
rand64k256m16x 1297.00 20751 1068.57 17097
rand64k256m16x--fsync=1 1290.90 20654 353.82 5661
rand1m16g1x 315.15 315 563.77 563
rand1m16g1x--fsync=1 345.88 345 157.01 157
Peaks are at about 20k IOPS and ~1.3GiB/s read, 1GiB/s write in the 64KB blocks with 16 jobs. Slowest is the random 4k block sync write at an abysmal 3MB/s and 700 IOPS The 1MB read/write tests have lower IOPS, but that is expected.

Rescue test, ZFS This test was also performed in rescue mode. Raw numbers, from the ?job-curie-zfs.log, converted to MiB/s and manually merged:
test read I/O read IOPS write I/O write IOPS
rand4k4g1x 77.20 19763 27.13 6944
rand4k4g1x--fsync=1 76.16 19495 6.53 1673
rand64k256m16x 1882.40 30118 70.58 1129
rand64k256m16x--fsync=1 1865.13 29842 71.98 1151
rand1m16g1x 921.62 921 102.21 102
rand1m16g1x--fsync=1 908.37 908 64.30 64
Peaks are at 1.8GiB/s read, also in the 64k job like above, but much faster. The write is, as expected, much slower at 70MiB/s (compared to 1GiB/s!), but it should be noted the sync write doesn't degrade performance compared to async writes (although it's still below the LVM 300MB/s).

Conclusions Really, ZFS has trouble performing in all write conditions. The random 4k sync write test is the only place where ZFS outperforms LVM in writes, and barely (7MiB/s vs 3MiB/s). Everywhere else, writes are much slower, sometimes by an order of magnitude. And before some ZFS zealot jumps in talking about the SLOG or some other cache that could be added to improved performance, I'll remind you that those numbers are on a bare bones NVMe drive, pretty much as fast storage as you can find on this machine. Adding another NVMe drive as a cache probably will not improve write performance here. Still, those are very different results than the tests performed by Salter which shows ZFS beating traditional configurations in all categories but uncached 4k reads (not writes!). That said, those tests are very different from the tests I performed here, where I test writes on a single disk, not a RAID array, which might explain the discrepancy. Also, note that neither LVM or ZFS manage to reach the 2400MB/s read and 1750MB/s write performance specification. ZFS does manage to reach 82% of the read performance (1973MB/s) and LVM 64% of the write performance (1120MB/s). LVM hits 57% of the read performance and ZFS hits barely 6% of the write performance. Overall, I'm a bit disappointed in the ZFS write performance here, I must say. Maybe I need to tweak the record size or some other ZFS voodoo, but I'll note that I didn't have to do any such configuration on the other side to kick ZFS in the pants...

Real world experience This section document not synthetic backups, but actual real world workloads, comparing before and after I switched my workstation to ZFS.

Docker performance I had the feeling that running some git hook (which was firing a Docker container) was "slower" somehow. It seems that, at runtime, ZFS backends are significant slower than their overlayfs/ext4 equivalent:
May 16 14:42:52 curie systemd[1]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87\x2dinit-merged.mount: Succeeded.
May 16 14:42:52 curie systemd[5161]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87\x2dinit-merged.mount: Succeeded.
May 16 14:42:52 curie systemd[1]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
May 16 14:42:53 curie dockerd[1723]: time="2022-05-16T14:42:53.087219426-04:00" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10 pid=151170
May 16 14:42:53 curie systemd[1]: Started libcontainer container af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10.
May 16 14:42:54 curie systemd[1]: docker-af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10.scope: Succeeded.
May 16 14:42:54 curie dockerd[1723]: time="2022-05-16T14:42:54.047297800-04:00" level=info msg="shim disconnected" id=af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10
May 16 14:42:54 curie dockerd[998]: time="2022-05-16T14:42:54.051365015-04:00" level=info msg="ignoring event" container=af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
May 16 14:42:54 curie systemd[2444]: run-docker-netns-f5453c87c879.mount: Succeeded.
May 16 14:42:54 curie systemd[5161]: run-docker-netns-f5453c87c879.mount: Succeeded.
May 16 14:42:54 curie systemd[2444]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
May 16 14:42:54 curie systemd[5161]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
May 16 14:42:54 curie systemd[1]: run-docker-netns-f5453c87c879.mount: Succeeded.
May 16 14:42:54 curie systemd[1]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
Translating this:
  • container setup: ~1 second
  • container runtime: ~1 second
  • container teardown: ~1 second
  • total runtime: 2-3 seconds
Obviously, those timestamps are not quite accurate enough to make precise measurements... After I switched to ZFS:
mai 30 15:31:39 curie systemd[1]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf\x2dinit.mount: Succeeded. 
mai 30 15:31:39 curie systemd[5287]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf\x2dinit.mount: Succeeded. 
mai 30 15:31:40 curie systemd[1]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded. 
mai 30 15:31:40 curie systemd[5287]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded. 
mai 30 15:31:41 curie dockerd[3199]: time="2022-05-30T15:31:41.551403693-04:00" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142 pid=141080 
mai 30 15:31:41 curie systemd[1]: run-docker-runtime\x2drunc-moby-42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142-runc.ZVcjvl.mount: Succeeded. 
mai 30 15:31:41 curie systemd[5287]: run-docker-runtime\x2drunc-moby-42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142-runc.ZVcjvl.mount: Succeeded. 
mai 30 15:31:41 curie systemd[1]: Started libcontainer container 42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142. 
mai 30 15:31:45 curie systemd[1]: docker-42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142.scope: Succeeded. 
mai 30 15:31:45 curie dockerd[3199]: time="2022-05-30T15:31:45.883019128-04:00" level=info msg="shim disconnected" id=42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142 
mai 30 15:31:45 curie dockerd[1726]: time="2022-05-30T15:31:45.883064491-04:00" level=info msg="ignoring event" container=42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete" 
mai 30 15:31:45 curie systemd[1]: run-docker-netns-e45f5cf5f465.mount: Succeeded. 
mai 30 15:31:45 curie systemd[5287]: run-docker-netns-e45f5cf5f465.mount: Succeeded. 
mai 30 15:31:45 curie systemd[1]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded. 
mai 30 15:31:45 curie systemd[5287]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded.
That's double or triple the run time, from 2 seconds to 6 seconds. Most of the time is spent in run time, inside the container. Here's the breakdown:
  • container setup: ~2 seconds
  • container run: ~4 seconds
  • container teardown: ~1 second
  • total run time: about ~6-7 seconds
That's a two- to three-fold increase! Clearly something is going on here that I should tweak. It's possible that code path is less optimized in Docker. I also worry about podman, but apparently it also supports ZFS backends. Possibly it would perform better, but at this stage I wouldn't have a good comparison: maybe it would have performed better on non-ZFS as well...

Interactivity While doing the offsite backups (below), the system became somewhat "sluggish". I felt everything was slow, and I estimate it introduced ~50ms latency in any input device. Arguably, those are all USB and the external drive was connected through USB, but I suspect the ZFS drivers are not as well tuned with the scheduler as the regular filesystem drivers...

Recovery procedures For test purposes, I unmounted all systems during the procedure:
umount /mnt/boot/efi /mnt/boot/run
umount -a -t zfs
zpool export -a
And disconnected the drive, to see how I would recover this system from another Linux system in case of a total motherboard failure. To import an existing pool, plug the device, then import the pool with an alternate root, so it doesn't mount over your existing filesystems, then you mount the root filesystem and all the others:
zpool import -l -a -R /mnt &&
zfs mount rpool/ROOT/debian &&
zfs mount -a &&
mount /dev/sdc2 /mnt/boot/efi &&
mount -t tmpfs tmpfs /mnt/run &&
mkdir /mnt/run/lock

Offsite backup Part of the goal of using ZFS is to simplify and harden backups. I wanted to experiment with shorter recovery times specifically both point in time recovery objective and recovery time objective and faster incremental backups. This is, therefore, part of my backup services. This section documents how an external NVMe enclosure was setup in a pool to mirror the datasets from my workstation. The final setup should include syncoid copying datasets to the backup server regularly, but I haven't finished that configuration yet.

Partitioning The above partitioning procedure used sgdisk, but I couldn't figure out how to do this with sgdisk, so this uses sfdisk to dump the partition from the first disk to an external, identical drive:
sfdisk -d /dev/nvme0n1   sfdisk --no-reread /dev/sda --force

Pool creation This is similar to the main pool creation, except we tweaked a few bits after changing the upstream procedure:
zpool create \
        -o cachefile=/etc/zfs/zpool.cache \
        -o ashift=12 -d \
        -o feature@async_destroy=enabled \
        -o feature@bookmarks=enabled \
        -o feature@embedded_data=enabled \
        -o feature@empty_bpobj=enabled \
        -o feature@enabled_txg=enabled \
        -o feature@extensible_dataset=enabled \
        -o feature@filesystem_limits=enabled \
        -o feature@hole_birth=enabled \
        -o feature@large_blocks=enabled \
        -o feature@lz4_compress=enabled \
        -o feature@spacemap_histogram=enabled \
        -o feature@zpool_checkpoint=enabled \
        -O acltype=posixacl -O xattr=sa \
        -O compression=lz4 \
        -O devices=off \
        -O relatime=on \
        -O canmount=off \
        -O mountpoint=/boot -R /mnt \
        bpool-tubman /dev/sdb3
The change from the main boot pool are: Main pool creation is:
zpool create \
        -o ashift=12 \
        -O encryption=on -O keylocation=prompt -O keyformat=passphrase \
        -O acltype=posixacl -O xattr=sa -O dnodesize=auto \
        -O compression=zstd \
        -O relatime=on \
        -O canmount=off \
        -O mountpoint=/ -R /mnt \
        rpool-tubman /dev/sdb4

First sync I used syncoid to copy all pools over to the external device. syncoid is a thing that's part of the sanoid project which is specifically designed to sync snapshots between pool, typically over SSH links but it can also operate locally. The sanoid command had a --readonly argument to simulate changes, but syncoid didn't so I tried to fix that with an upstream PR. It seems it would be better to do this by hand, but this was much easier. The full first sync was:
root@curie:/home/anarcat# ./bin/syncoid -r  bpool bpool-tubman
CRITICAL ERROR: Target bpool-tubman exists but has no snapshots matching with bpool!
                Replication to target would require destroying existing
                target. Cowardly refusing to destroy your existing target.
          NOTE: Target bpool-tubman dataset is < 64MB used - did you mistakenly run
                 zfs create bpool-tubman  on the target? ZFS initial
                replication must be to a NON EXISTENT DATASET, which will
                then be CREATED BY the initial replication process.
INFO: Sending oldest full snapshot bpool/BOOT@test (~ 42 KB) to new target filesystem:
44.2KiB 0:00:00 [4.19MiB/s] [========================================================================================================================] 103%            
INFO: Updating new target filesystem with incremental bpool/BOOT@test ... syncoid_curie_2022-05-30:12:50:39 (~ 4 KB):
2.13KiB 0:00:00 [ 114KiB/s] [===============================================================>                                                         ] 53%            
INFO: Sending oldest full snapshot bpool/BOOT/debian@install (~ 126.0 MB) to new target filesystem:
 126MiB 0:00:00 [ 308MiB/s] [=======================================================================================================================>] 100%            
INFO: Updating new target filesystem with incremental bpool/BOOT/debian@install ... syncoid_curie_2022-05-30:12:50:39 (~ 113.4 MB):
 113MiB 0:00:00 [ 315MiB/s] [=======================================================================================================================>] 100%
root@curie:/home/anarcat# ./bin/syncoid -r  rpool rpool-tubman
CRITICAL ERROR: Target rpool-tubman exists but has no snapshots matching with rpool!
                Replication to target would require destroying existing
                target. Cowardly refusing to destroy your existing target.
          NOTE: Target rpool-tubman dataset is < 64MB used - did you mistakenly run
                 zfs create rpool-tubman  on the target? ZFS initial
                replication must be to a NON EXISTENT DATASET, which will
                then be CREATED BY the initial replication process.
INFO: Sending oldest full snapshot rpool/ROOT@syncoid_curie_2022-05-30:12:50:51 (~ 69 KB) to new target filesystem:
44.2KiB 0:00:00 [2.44MiB/s] [===========================================================================>                                             ] 63%            
INFO: Sending oldest full snapshot rpool/ROOT/debian@install (~ 25.9 GB) to new target filesystem:
25.9GiB 0:03:33 [ 124MiB/s] [=======================================================================================================================>] 100%            
INFO: Updating new target filesystem with incremental rpool/ROOT/debian@install ... syncoid_curie_2022-05-30:12:50:52 (~ 3.9 GB):
3.92GiB 0:00:33 [ 119MiB/s] [======================================================================================================================>  ] 99%            
INFO: Sending oldest full snapshot rpool/home@syncoid_curie_2022-05-30:12:55:04 (~ 276.8 GB) to new target filesystem:
 277GiB 0:27:13 [ 174MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/home/root@syncoid_curie_2022-05-30:13:22:19 (~ 2.2 GB) to new target filesystem:
2.22GiB 0:00:25 [90.2MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var@syncoid_curie_2022-05-30:13:22:47 (~ 5.6 GB) to new target filesystem:
5.56GiB 0:00:32 [ 176MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var/cache@syncoid_curie_2022-05-30:13:23:22 (~ 627.3 MB) to new target filesystem:
 627MiB 0:00:03 [ 169MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var/lib@syncoid_curie_2022-05-30:13:23:28 (~ 69 KB) to new target filesystem:
44.2KiB 0:00:00 [1.40MiB/s] [===========================================================================>                                             ] 63%            
INFO: Sending oldest full snapshot rpool/var/lib/docker@syncoid_curie_2022-05-30:13:23:28 (~ 442.6 MB) to new target filesystem:
 443MiB 0:00:04 [ 103MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var/lib/docker/05c0de7fabbea60500eaa495d0d82038249f6faa63b12914737c4d71520e62c5@266253254 (~ 6.3 MB) to new target filesystem:
6.49MiB 0:00:00 [12.9MiB/s] [========================================================================================================================] 102%            
INFO: Updating new target filesystem with incremental rpool/var/lib/docker/05c0de7fabbea60500eaa495d0d82038249f6faa63b12914737c4d71520e62c5@266253254 ... syncoid_curie_2022-05-30:13:23:34 (~ 4 KB):
1.52KiB 0:00:00 [27.6KiB/s] [============================================>                                                                            ] 38%            
INFO: Sending oldest full snapshot rpool/var/lib/flatpak@syncoid_curie_2022-05-30:13:23:36 (~ 2.0 GB) to new target filesystem:
2.00GiB 0:00:17 [ 115MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var/tmp@syncoid_curie_2022-05-30:13:23:55 (~ 57.0 MB) to new target filesystem:
61.8MiB 0:00:01 [45.0MiB/s] [========================================================================================================================] 108%            
INFO: Clone is recreated on target rpool-tubman/var/lib/docker/ed71ddd563a779ba6fb37b3b1d0cc2c11eca9b594e77b4b234867ebcb162b205 based on rpool/var/lib/docker/05c0de7fabbea60500eaa495d0d82038249f6faa63b12914737c4d71520e62c5@266253254
INFO: Sending oldest full snapshot rpool/var/lib/docker/ed71ddd563a779ba6fb37b3b1d0cc2c11eca9b594e77b4b234867ebcb162b205@syncoid_curie_2022-05-30:13:23:58 (~ 218.6 MB) to new target filesystem:
 219MiB 0:00:01 [ 151MiB/s] [=======================================================================================================================>] 100%
Funny how the CRITICAL ERROR doesn't actually stop syncoid and it just carries on merrily doing when it's telling you it's "cowardly refusing to destroy your existing target"... Maybe that's because my pull request broke something though... During the transfer, the computer was very sluggish: everything feels like it has ~30-50ms latency extra:
anarcat@curie:sanoid$ LANG=C top -b  -n 1   head -20
top - 13:07:05 up 6 days,  4:01,  1 user,  load average: 16.13, 16.55, 11.83
Tasks: 606 total,   6 running, 598 sleeping,   0 stopped,   2 zombie
%Cpu(s): 18.8 us, 72.5 sy,  1.2 ni,  5.0 id,  1.2 wa,  0.0 hi,  1.2 si,  0.0 st
MiB Mem :  15898.4 total,   1387.6 free,  13170.0 used,   1340.8 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   1319.8 avail Mem 
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
     70 root      20   0       0      0      0 S  83.3   0.0   6:12.67 kswapd0
4024878 root      20   0  282644  96432  10288 S  44.4   0.6   0:11.43 puppet
3896136 root      20   0   35328  16528     48 S  22.2   0.1   2:08.04 mbuffer
3896135 root      20   0   10328    776    168 R  16.7   0.0   1:22.93 zfs
3896138 root      20   0   10588    788    156 R  16.7   0.0   1:49.30 zfs
    350 root       0 -20       0      0      0 R  11.1   0.0   1:03.53 z_rd_int
    351 root       0 -20       0      0      0 S  11.1   0.0   1:04.15 z_rd_int
3896137 root      20   0    4384    352    244 R  11.1   0.0   0:44.73 pv
4034094 anarcat   30  10   20028  13960   2428 S  11.1   0.1   0:00.70 mbsync
4036539 anarcat   20   0    9604   3464   2408 R  11.1   0.0   0:00.04 top
    352 root       0 -20       0      0      0 S   5.6   0.0   1:03.64 z_rd_int
    353 root       0 -20       0      0      0 S   5.6   0.0   1:03.64 z_rd_int
    354 root       0 -20       0      0      0 S   5.6   0.0   1:04.01 z_rd_int
I wonder how much of that is due to syncoid, particularly because I often saw mbuffer and pv in there which are not strictly necessary to do those kind of operations, as far as I understand. Once that's done, export the pools to disconnect the drive:
zpool export bpool-tubman
zpool export rpool-tubman

Raw disk benchmark Copied the 512GB SSD/M.2 device to another 1024GB NVMe/M.2 device:
anarcat@curie:~$ sudo dd if=/dev/sdb of=/dev/sdc bs=4M status=progress conv=fdatasync
499944259584 octets (500 GB, 466 GiB) copi s, 1713 s, 292 MB/s
119235+1 enregistrements lus
119235+1 enregistrements  crits
500107862016 octets (500 GB, 466 GiB) copi s, 1719,93 s, 291 MB/s
... while both over USB, whoohoo 300MB/s!

Monitoring ZFS should be monitoring your pools regularly. Normally, the [[!debman zed]] daemon monitors all ZFS events. It is the thing that will report when a scrub failed, for example. See this configuration guide. Scrubs should be regularly scheduled to ensure consistency of the pool. This can be done in newer zfsutils-linux versions (bullseye-backports or bookworm) with one of those, depending on the desired frequency:
systemctl enable zfs-scrub-weekly@rpool.timer --now
systemctl enable zfs-scrub-monthly@rpool.timer --now
When the scrub runs, if it finds anything it will send an event which will get picked up by the zed daemon which will then send a notification, see below for an example. TODO: deploy on curie, if possible (probably not because no RAID) TODO: this should be in Puppet

Scrub warning example So what happens when problems are found? Here's an example of how I dealt with an error I received. After setting up another server (tubman) with ZFS, I eventually ended up getting a warning from the ZFS toolchain.
Date: Sun, 09 Oct 2022 00:58:08 -0400
From: root <root@anarc.at>
To: root@anarc.at
Subject: ZFS scrub_finish event for rpool on tubman
ZFS has finished a scrub:
   eid: 39536
 class: scrub_finish
  host: tubman
  time: 2022-10-09 00:58:07-0400
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 00:33:57 with 0 errors on Sun Oct  9 00:58:07 2022
config:
        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdb4    ONLINE       0     1     0
            sdc4    ONLINE       0     0     0
        cache
          sda3      ONLINE       0     0     0
errors: No known data errors
This, in itself, is a little worrisome. But it helpfully links to this more detailed documentation (and props up there: the link still works) which explains this is a "minor" problem (something that could be included in the report). In this case, this happened on a server setup on 2021-04-28, but the disks and server hardware are much older. The server itself (marcos v1) was built around 2011, over 10 years ago now. The hard drive in question is:
root@tubman:~# smartctl -i -qnoserial /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-15-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family:     Seagate BarraCuda 3.5
Device Model:     ST4000DM004-2CV104
Firmware Version: 0001
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5425 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Oct 11 11:02:32 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Some more SMART stats:
root@tubman:~# smartctl -a -qnoserial /dev/sdb   grep -e  Head_Flying_Hours -e Power_On_Hours -e Total_LBA -e 'Sector Sizes'
Sector Sizes:     512 bytes logical, 4096 bytes physical
  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always       -       12464 (206 202 0)
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       10966h+55m+23.757s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       21107792664
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       3201579750
That's over a year of power on, which shouldn't be so bad. It has written about 10TB of data (21107792664 LBAs * 512 byte/LBA), which is about two full writes. According to its specification, this device is supposed to support 55 TB/year of writes, so we're far below spec. Note that are still far from the "non-recoverable read error per bits" spec (1 per 10E15), as we've basically read 13E12 bits (3201579750 LBAs * 512 byte/LBA = 13E12 bits). It's likely this disk was made in 2018, so it is in its fourth year. Interestingly, /dev/sdc is also a Seagate drive, but of a different series:
root@tubman:~# smartctl -qnoserial  -i /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-15-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family:     Seagate BarraCuda 3.5
Device Model:     ST4000DM004-2CV104
Firmware Version: 0001
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5425 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Oct 11 11:21:35 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
It has seen much more reads than the other disk which is also interesting:
root@tubman:~# smartctl -a -qnoserial /dev/sdc   grep -e  Head_Flying_Hours -e Power_On_Hours -e Total_LBA -e 'Sector Sizes'
Sector Sizes:     512 bytes logical, 4096 bytes physical
  9 Power_On_Hours          0x0032   059   059   000    Old_age   Always       -       36240
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       33994h+10m+52.118s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       30730174438
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       51894566538
That's 4 years of Head_Flying_Hours, and over 4 years (4 years and 48 days) of Power_On_Hours. The copyright date on that drive's specs goes back to 2016, so it's a much older drive. SMART self-test succeeded.

Remaining issues
  • TODO: move send/receive backups to offsite host, see also zfs for alternatives to syncoid/sanoid there
  • TODO: setup backup cron job (or timer?)
  • TODO: swap still not setup on curie, see zfs
  • TODO: document this somewhere: bpool and rpool are both pools and datasets. that's pretty confusing, but also very useful because it allows for pool-wide recursive snapshots, which are used for the backup system

fio improvements I really want to improve my experience with fio. Right now, I'm just cargo-culting stuff from other folks and I don't really like it. stressant is a good example of my struggles, in the sense that it doesn't really work that well for disk tests. I would love to have just a single .fio job file that lists multiple jobs to run serially. For example, this file describes the above workload pretty well:
[global]
# cargo-culting Salter
fallocate=none
ioengine=posixaio
runtime=60
time_based=1
end_fsync=1
stonewall=1
group_reporting=1
# no need to drop caches, done by default
# invalidate=1
# Single 4KiB random read/write process
[randread-4k-4g-1x]
rw=randread
bs=4k
size=4g
numjobs=1
iodepth=1
[randwrite-4k-4g-1x]
rw=randwrite
bs=4k
size=4g
numjobs=1
iodepth=1
# 16 parallel 64KiB random read/write processes:
[randread-64k-256m-16x]
rw=randread
bs=64k
size=256m
numjobs=16
iodepth=16
[randwrite-64k-256m-16x]
rw=randwrite
bs=64k
size=256m
numjobs=16
iodepth=16
# Single 1MiB random read/write process
[randread-1m-16g-1x]
rw=randread
bs=1m
size=16g
numjobs=1
iodepth=1
[randwrite-1m-16g-1x]
rw=randwrite
bs=1m
size=16g
numjobs=1
iodepth=1
... except the jobs are actually started in parallel, even though they are stonewall'd, as far as I can tell by the reports. I sent a mail to the fio mailing list for clarification. It looks like the jobs are started in parallel, but actual (correctly) run serially. It seems like this might just be a matter of reporting the right timestamps in the end, although it does feel like starting all the processes (even if not doing any work yet) could skew the results.

Hangs during procedure During the procedure, it happened a few times where any ZFS command would completely hang. It seems that using an external USB drive to sync stuff didn't work so well: sometimes it would reconnect under a different device (from sdc to sdd, for example), and this would greatly confuse ZFS. Here, for example, is sdd reappearing out of the blue:
May 19 11:22:53 curie kernel: [  699.820301] scsi host4: uas
May 19 11:22:53 curie kernel: [  699.820544] usb 2-1: authorized to connect
May 19 11:22:53 curie kernel: [  699.922433] scsi 4:0:0:0: Direct-Access     ROG      ESD-S1C          0    PQ: 0 ANSI: 6
May 19 11:22:53 curie kernel: [  699.923235] sd 4:0:0:0: Attached scsi generic sg2 type 0
May 19 11:22:53 curie kernel: [  699.923676] sd 4:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
May 19 11:22:53 curie kernel: [  699.923788] sd 4:0:0:0: [sdd] Write Protect is off
May 19 11:22:53 curie kernel: [  699.923949] sd 4:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
May 19 11:22:53 curie kernel: [  699.924149] sd 4:0:0:0: [sdd] Optimal transfer size 33553920 bytes
May 19 11:22:53 curie kernel: [  699.961602]  sdd: sdd1 sdd2 sdd3 sdd4
May 19 11:22:53 curie kernel: [  699.996083] sd 4:0:0:0: [sdd] Attached SCSI disk
Next time I run a ZFS command (say zpool list), the command completely hangs (D state) and this comes up in the logs:
May 19 11:34:21 curie kernel: [ 1387.914843] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=71344128 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.914859] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=205565952 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.914874] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=272789504 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.914906] zio pool=bpool vdev=/dev/sdc3 error=5 type=1 offset=270336 size=8192 flags=b08c1
May 19 11:34:21 curie kernel: [ 1387.914932] zio pool=bpool vdev=/dev/sdc3 error=5 type=1 offset=1073225728 size=8192 flags=b08c1
May 19 11:34:21 curie kernel: [ 1387.914948] zio pool=bpool vdev=/dev/sdc3 error=5 type=1 offset=1073487872 size=8192 flags=b08c1
May 19 11:34:21 curie kernel: [ 1387.915165] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=272793600 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.915183] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=339853312 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.915648] WARNING: Pool 'bpool' has encountered an uncorrectable I/O failure and has been suspended.
May 19 11:34:21 curie kernel: [ 1387.915648] 
May 19 11:37:25 curie kernel: [ 1571.558614] task:txg_sync        state:D stack:    0 pid:  997 ppid:     2 flags:0x00004000
May 19 11:37:25 curie kernel: [ 1571.558623] Call Trace:
May 19 11:37:25 curie kernel: [ 1571.558640]  __schedule+0x282/0x870
May 19 11:37:25 curie kernel: [ 1571.558650]  schedule+0x46/0xb0
May 19 11:37:25 curie kernel: [ 1571.558670]  schedule_timeout+0x8b/0x140
May 19 11:37:25 curie kernel: [ 1571.558675]  ? __next_timer_interrupt+0x110/0x110
May 19 11:37:25 curie kernel: [ 1571.558678]  io_schedule_timeout+0x4c/0x80
May 19 11:37:25 curie kernel: [ 1571.558689]  __cv_timedwait_common+0x12b/0x160 [spl]
May 19 11:37:25 curie kernel: [ 1571.558694]  ? add_wait_queue_exclusive+0x70/0x70
May 19 11:37:25 curie kernel: [ 1571.558702]  __cv_timedwait_io+0x15/0x20 [spl]
May 19 11:37:25 curie kernel: [ 1571.558816]  zio_wait+0x129/0x2b0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.558929]  dsl_pool_sync+0x461/0x4f0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559032]  spa_sync+0x575/0xfa0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559138]  ? spa_txg_history_init_io+0x101/0x110 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559245]  txg_sync_thread+0x2e0/0x4a0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559354]  ? txg_fini+0x240/0x240 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559366]  thread_generic_wrapper+0x6f/0x80 [spl]
May 19 11:37:25 curie kernel: [ 1571.559376]  ? __thread_exit+0x20/0x20 [spl]
May 19 11:37:25 curie kernel: [ 1571.559379]  kthread+0x11b/0x140
May 19 11:37:25 curie kernel: [ 1571.559382]  ? __kthread_bind_mask+0x60/0x60
May 19 11:37:25 curie kernel: [ 1571.559386]  ret_from_fork+0x22/0x30
May 19 11:37:25 curie kernel: [ 1571.559401] task:zed             state:D stack:    0 pid: 1564 ppid:     1 flags:0x00000000
May 19 11:37:25 curie kernel: [ 1571.559404] Call Trace:
May 19 11:37:25 curie kernel: [ 1571.559409]  __schedule+0x282/0x870
May 19 11:37:25 curie kernel: [ 1571.559412]  ? __kmalloc_node+0x141/0x2b0
May 19 11:37:25 curie kernel: [ 1571.559417]  schedule+0x46/0xb0
May 19 11:37:25 curie kernel: [ 1571.559420]  schedule_preempt_disabled+0xa/0x10
May 19 11:37:25 curie kernel: [ 1571.559424]  __mutex_lock.constprop.0+0x133/0x460
May 19 11:37:25 curie kernel: [ 1571.559435]  ? nvlist_xalloc.part.0+0x68/0xc0 [znvpair]
May 19 11:37:25 curie kernel: [ 1571.559537]  spa_all_configs+0x41/0x120 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559644]  zfs_ioc_pool_configs+0x17/0x70 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559752]  zfsdev_ioctl_common+0x697/0x870 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559758]  ? _copy_from_user+0x28/0x60
May 19 11:37:25 curie kernel: [ 1571.559860]  zfsdev_ioctl+0x53/0xe0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559866]  __x64_sys_ioctl+0x83/0xb0
May 19 11:37:25 curie kernel: [ 1571.559869]  do_syscall_64+0x33/0x80
May 19 11:37:25 curie kernel: [ 1571.559873]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 19 11:37:25 curie kernel: [ 1571.559876] RIP: 0033:0x7fcf0ef32cc7
May 19 11:37:25 curie kernel: [ 1571.559878] RSP: 002b:00007fcf0e181618 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
May 19 11:37:25 curie kernel: [ 1571.559881] RAX: ffffffffffffffda RBX: 000055b212f972a0 RCX: 00007fcf0ef32cc7
May 19 11:37:25 curie kernel: [ 1571.559883] RDX: 00007fcf0e181640 RSI: 0000000000005a04 RDI: 000000000000000b
May 19 11:37:25 curie kernel: [ 1571.559885] RBP: 00007fcf0e184c30 R08: 00007fcf08016810 R09: 00007fcf08000080
May 19 11:37:25 curie kernel: [ 1571.559886] R10: 0000000000080000 R11: 0000000000000246 R12: 000055b212f972a0
May 19 11:37:25 curie kernel: [ 1571.559888] R13: 0000000000000000 R14: 00007fcf0e181640 R15: 0000000000000000
May 19 11:37:25 curie kernel: [ 1571.559980] task:zpool           state:D stack:    0 pid:11815 ppid:  3816 flags:0x00004000
May 19 11:37:25 curie kernel: [ 1571.559983] Call Trace:
May 19 11:37:25 curie kernel: [ 1571.559988]  __schedule+0x282/0x870
May 19 11:37:25 curie kernel: [ 1571.559992]  schedule+0x46/0xb0
May 19 11:37:25 curie kernel: [ 1571.559995]  io_schedule+0x42/0x70
May 19 11:37:25 curie kernel: [ 1571.560004]  cv_wait_common+0xac/0x130 [spl]
May 19 11:37:25 curie kernel: [ 1571.560008]  ? add_wait_queue_exclusive+0x70/0x70
May 19 11:37:25 curie kernel: [ 1571.560118]  txg_wait_synced_impl+0xc9/0x110 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560223]  txg_wait_synced+0xc/0x40 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560325]  spa_export_common+0x4cd/0x590 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560430]  ? zfs_log_history+0x9c/0xf0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560537]  zfsdev_ioctl_common+0x697/0x870 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560543]  ? _copy_from_user+0x28/0x60
May 19 11:37:25 curie kernel: [ 1571.560644]  zfsdev_ioctl+0x53/0xe0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560649]  __x64_sys_ioctl+0x83/0xb0
May 19 11:37:25 curie kernel: [ 1571.560653]  do_syscall_64+0x33/0x80
May 19 11:37:25 curie kernel: [ 1571.560656]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 19 11:37:25 curie kernel: [ 1571.560659] RIP: 0033:0x7fdc23be2cc7
May 19 11:37:25 curie kernel: [ 1571.560661] RSP: 002b:00007ffc8c792478 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
May 19 11:37:25 curie kernel: [ 1571.560664] RAX: ffffffffffffffda RBX: 000055942ca49e20 RCX: 00007fdc23be2cc7
May 19 11:37:25 curie kernel: [ 1571.560666] RDX: 00007ffc8c792490 RSI: 0000000000005a03 RDI: 0000000000000003
May 19 11:37:25 curie kernel: [ 1571.560667] RBP: 00007ffc8c795e80 R08: 00000000ffffffff R09: 00007ffc8c792310
May 19 11:37:25 curie kernel: [ 1571.560669] R10: 000055942ca49e30 R11: 0000000000000246 R12: 00007ffc8c792490
May 19 11:37:25 curie kernel: [ 1571.560671] R13: 000055942ca49e30 R14: 000055942aed2c20 R15: 00007ffc8c795a40
Here's another example, where you see the USB controller bleeping out and back into existence:
mai 19 11:38:39 curie kernel: usb 2-1: USB disconnect, device number 2
mai 19 11:38:39 curie kernel: sd 4:0:0:0: [sdd] Synchronizing SCSI cache
mai 19 11:38:39 curie kernel: sd 4:0:0:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
mai 19 11:39:25 curie kernel: INFO: task zed:1564 blocked for more than 241 seconds.
mai 19 11:39:25 curie kernel:       Tainted: P          IOE     5.10.0-14-amd64 #1 Debian 5.10.113-1
mai 19 11:39:25 curie kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mai 19 11:39:25 curie kernel: task:zed             state:D stack:    0 pid: 1564 ppid:     1 flags:0x00000000
mai 19 11:39:25 curie kernel: Call Trace:
mai 19 11:39:25 curie kernel:  __schedule+0x282/0x870
mai 19 11:39:25 curie kernel:  ? __kmalloc_node+0x141/0x2b0
mai 19 11:39:25 curie kernel:  schedule+0x46/0xb0
mai 19 11:39:25 curie kernel:  schedule_preempt_disabled+0xa/0x10
mai 19 11:39:25 curie kernel:  __mutex_lock.constprop.0+0x133/0x460
mai 19 11:39:25 curie kernel:  ? nvlist_xalloc.part.0+0x68/0xc0 [znvpair]
mai 19 11:39:25 curie kernel:  spa_all_configs+0x41/0x120 [zfs]
mai 19 11:39:25 curie kernel:  zfs_ioc_pool_configs+0x17/0x70 [zfs]
mai 19 11:39:25 curie kernel:  zfsdev_ioctl_common+0x697/0x870 [zfs]
mai 19 11:39:25 curie kernel:  ? _copy_from_user+0x28/0x60
mai 19 11:39:25 curie kernel:  zfsdev_ioctl+0x53/0xe0 [zfs]
mai 19 11:39:25 curie kernel:  __x64_sys_ioctl+0x83/0xb0
mai 19 11:39:25 curie kernel:  do_syscall_64+0x33/0x80
mai 19 11:39:25 curie kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
mai 19 11:39:25 curie kernel: RIP: 0033:0x7fcf0ef32cc7
mai 19 11:39:25 curie kernel: RSP: 002b:00007fcf0e181618 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
mai 19 11:39:25 curie kernel: RAX: ffffffffffffffda RBX: 000055b212f972a0 RCX: 00007fcf0ef32cc7
mai 19 11:39:25 curie kernel: RDX: 00007fcf0e181640 RSI: 0000000000005a04 RDI: 000000000000000b
mai 19 11:39:25 curie kernel: RBP: 00007fcf0e184c30 R08: 00007fcf08016810 R09: 00007fcf08000080
mai 19 11:39:25 curie kernel: R10: 0000000000080000 R11: 0000000000000246 R12: 000055b212f972a0
mai 19 11:39:25 curie kernel: R13: 0000000000000000 R14: 00007fcf0e181640 R15: 0000000000000000
mai 19 11:39:25 curie kernel: INFO: task zpool:11815 blocked for more than 241 seconds.
mai 19 11:39:25 curie kernel:       Tainted: P          IOE     5.10.0-14-amd64 #1 Debian 5.10.113-1
mai 19 11:39:25 curie kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mai 19 11:39:25 curie kernel: task:zpool           state:D stack:    0 pid:11815 ppid:  2621 flags:0x00004004
mai 19 11:39:25 curie kernel: Call Trace:
mai 19 11:39:25 curie kernel:  __schedule+0x282/0x870
mai 19 11:39:25 curie kernel:  schedule+0x46/0xb0
mai 19 11:39:25 curie kernel:  io_schedule+0x42/0x70
mai 19 11:39:25 curie kernel:  cv_wait_common+0xac/0x130 [spl]
mai 19 11:39:25 curie kernel:  ? add_wait_queue_exclusive+0x70/0x70
mai 19 11:39:25 curie kernel:  txg_wait_synced_impl+0xc9/0x110 [zfs]
mai 19 11:39:25 curie kernel:  txg_wait_synced+0xc/0x40 [zfs]
mai 19 11:39:25 curie kernel:  spa_export_common+0x4cd/0x590 [zfs]
mai 19 11:39:25 curie kernel:  ? zfs_log_history+0x9c/0xf0 [zfs]
mai 19 11:39:25 curie kernel:  zfsdev_ioctl_common+0x697/0x870 [zfs]
mai 19 11:39:25 curie kernel:  ? _copy_from_user+0x28/0x60
mai 19 11:39:25 curie kernel:  zfsdev_ioctl+0x53/0xe0 [zfs]
mai 19 11:39:25 curie kernel:  __x64_sys_ioctl+0x83/0xb0
mai 19 11:39:25 curie kernel:  do_syscall_64+0x33/0x80
mai 19 11:39:25 curie kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
mai 19 11:39:25 curie kernel: RIP: 0033:0x7fdc23be2cc7
mai 19 11:39:25 curie kernel: RSP: 002b:00007ffc8c792478 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
mai 19 11:39:25 curie kernel: RAX: ffffffffffffffda RBX: 000055942ca49e20 RCX: 00007fdc23be2cc7
mai 19 11:39:25 curie kernel: RDX: 00007ffc8c792490 RSI: 0000000000005a03 RDI: 0000000000000003
mai 19 11:39:25 curie kernel: RBP: 00007ffc8c795e80 R08: 00000000ffffffff R09: 00007ffc8c792310
mai 19 11:39:25 curie kernel: R10: 000055942ca49e30 R11: 0000000000000246 R12: 00007ffc8c792490
mai 19 11:39:25 curie kernel: R13: 000055942ca49e30 R14: 000055942aed2c20 R15: 00007ffc8c795a40
I understand those are rather extreme conditions: I would fully expect the pool to stop working if the underlying drives disappear. What doesn't seem acceptable is that a command would completely hang like this.

References See the zfs documentation for more information about ZFS, and tubman for another installation and migration procedure.

6 November 2022

Russ Allbery: Review: Matrix

Review: Matrix, by Lauren Groff
Publisher: Riverhead Books
Copyright: 2021
ISBN: 0-698-40513-7
Format: Kindle
Pages: 260
Marie is a royal bastardess, a product of rape no less, and entirely out of place in the court in Westminster, where she landed after being kicked off her mother's farm. She had run the farm since her mother's untimely death, but there was no way that her relatives would let her inherit. In court, Marie is too tall, too ugly, and too strange, raised by women who were said to carry the blood of the fairy M lusine. Eleanor of Aquitaine's solution to her unwanted house guest is a Papal commission. Marie is to become the prioress of an abbey. I am occasionally unpleasantly reminded of why I don't read very much literary fiction. It's immensely frustrating to read a book in which the author cares about entirely different things than the reader, and where the story beats all land wrong. This is literary historical fiction set in the 12th century. Marie is Marie de France, author of the lais about courtly love that are famous primarily due to their position as early sources for the legends of King Arthur. The lais are written on-screen very early in this book, but they disappear without any meaningful impact on the story. Matrix is, instead, about Shaftesbury Abbey and what it becomes during Marie's time as prioress and then abbess, following the theory that Marie de France was Mary of Shaftesbury. What I thought I was getting in this book, from numerous reviews and recommendations, was a story of unexpected competence: how a wild, unwanted child of seventeen lands at a dilapidated and starving abbey, entirely against her will, and then over the next sixty years transforms it into one of the richest abbeys in England. This does happen in this book, but Groff doesn't seem to care about the details of that transformation at all. Instead, Matrix takes the mimetic fiction approach of detailed and precise description of a few characters, with all of their flaws and complexities, and with all of the narrative's attention turned to how they are feeling and what they are seeing. It is also deeply, fully committed to a Great Man (or in this case a Great Woman) view of history. Marie is singular. The narrative follows her alone, she makes all the significant decisions, and the development of the abbey is determined by her apparently mystical visions. (In typical mimetic fashion, these are presented as real to her, and the novel takes no position on whether that reality is objective.) She builds spy networks, maneuvers through local and church politics, and runs the abbey like her personal kingdom. The tiny amount of this that is necessarily done by other people is attributed to Marie's ability to judge character. Other people's motives are simply steamrolled over and have no effect. Maddeningly, essentially all of this happens off-screen, and Groff is completely uninterested in the details of how any of it is accomplished. Marie decides to do something, the narrative skips forward a year, and it has happened. She decides to build something, and then it's built. She decides to collect the rents she's due, the novel gestures vaguely at how she's intimidating, and then everyone is happily paying up. She builds spy networks; who cares how? She maneuvers through crises of local and church politics that are vaguely alluded to, through techniques that are apparently too uninteresting to bother the reader with. Instead, the narrative focuses on two things: her deeply dysfunctional, parasocial relationship with Eleanor, and her tyrannical relationship with the other nuns. I suspect that Groff would strongly disagree with my characterization of both of those narratives, and that's the other half of my problem with this book. Marie is obsessed with and in love with Eleanor, a completely impossible love to even talk about, and therefore turns to courtly love from afar as a model into which she can fit her feelings. While this is the setup for a tragedy, it's a good idea for a story. But what undermined it for me is that Marie's obsession seems to be largely physical (she constantly dwells on Eleanor's beauty), and Eleanor is absolutely horrible to her in every way: condescending, contemptuous, dismissive, and completely uninterested. This does change a bit over the course of the book, but not enough to justify the crush that Marie maintains for this awful person through her entire life. And Eleanor is the only person in the book who Marie treats like an equal. Everyone else is a subordinate, a daughter, a charge, a servant, or a worker. The nuns of the abbey prosper under her rule, so Marie has ample reason to justify this to herself, but no one else's opinions or beliefs matter to her in any significant way. The closest anyone can come to friendship is to be reliably obedient, perhaps after some initial objections that Marie overrules. Despite some quite good characterization of the other nuns, none of the other characters get to do anything. There is no delight in teamwork, sense of healthy community, or collaborative problem-solving. It's just all Marie, all the time, imposing her vision on the world both living and non-living through sheer force of will. This just isn't entertaining, at least for me. The writing might be beautiful, the descriptions detailed and effective, and the research clearly extensive, but I read books primarily for characters, I read characters primarily for their relationships, and these relationships are deeply, horribly unhealthy. They are not, to be clear, unrealistic (although I do think there's way too much chosen one in Marie and way too many problems that disappear off-camera); there are certainly people in the world with dysfunctional obsessive relationships, and there are charismatic people who overwhelm everyone around them. This is just not what I want to read about. You might think, with all I've said above, that I'm spoiling a lot of the book, but weirdly I don't think I am. Every pattern I mention above is well-established early in the novel. About the only thing that I'm spoiling is the hope that any of it is somehow going to change, a hope that I clung to for rather too long. This is a great setup for a book, and I wish it were written by a fantasy author instead of a literary author. Perhaps I'm being too harsh on literary fiction here, but I feel like fantasy authors are more likely to write for readers who want to see the growth sequence. If someone is going to change the world, I want to see how they changed the world. The mode of fantasy writing tends to think that what people do (and how they do it) is as interesting or more interesting than what they feel or what they perceive. If this idea, with the exact same level of (minor) mysticism and historic realism, without added magic, were written by, say, Guy Gavriel Kay or Nicola Griffith, it would be a far different and, in my opinion, a much better book. In fact, Hild is part of this book written by Nicola Griffith, and it is a much better book. I have seen enough people rave about this book to know that this is a personal reaction that is not going to be shared by everyone, or possibly even most people. My theory is that this is due to the different reading protocols between literary fiction readers and fantasy readers. I put myself in the latter camp; if you prefer literary fiction, you may like this much better (and also I'm not sure you'll find my book reviews useful). I may be wrong, though; maybe there are fantasy readers who would like this. I will say that the sense of place is very strong and the writing has all the expected literary strengths of descriptiveness and rhythm. But, sadly, this was not at all my thing, and I'm irritated that I wasted time on it. Rating: 4 out of 10

26 October 2022

Russ Allbery: Review: The Golden Enclaves

Review: The Golden Enclaves, by Naomi Novik
Series: The Scholomance #3
Publisher: Del Rey
Copyright: 2022
ISBN: 0-593-15836-9
Format: Kindle
Pages: 408
The Golden Enclaves is the third and concluding book of the Scholomance trilogy and picks up literally the instant after the end of The Last Graduate. The three books form a coherent and complete story that under absolutely no circumstances should be read out of order. This is an impossible review to write because everything is a spoiler. You're only going to read this book if you've read and liked the first two, and in that case you do not want to know a single detail about this book before you read it. The timing of revelations was absolutely perfect; I repeatedly figured out what was going on at exactly the same time that El did, which rarely happens in a book. (And from talking to friends I am not the only one.) If you're still deciding whether to read the series, or are deciding how to prioritize the third book, here are the things you need to know:
  1. Novik nails the ending. Absolutely knocks it out of the park.
  2. Everything is explained, and the explanation was wholly satisfying.
  3. There is more Liesel, and she's even better in the third book.
  4. El's relationship with her mother still works perfectly.
  5. Holy shit.
You can now stop reading this review here and go read, assured that this is the best work of Novik's career to date and has become my favorite fantasy series of all time, something I do not say lightly. For those who want some elaboration, I'll gush some more about this book, but the above is all you need to know. There are so many things that I loved about this series, but the most impressive to me is how each book broadens the scope of the story while maintaining full continuity with the characters and plot. Novik moves from individuals to small groups to, in this book, systems and social forces without dropping a beat and without ever losing the characters. She could have written a series only about El and her friends and it still would have been amazing, but each book takes a risky leap into a broader perspective and she pulls it off every time. This is also one of the most enjoyable first-person perspectives that I have ever read. (I think only Code Name Verity competes, and that's my favorite novel of all time.) Whether you like this series at all will depend on whether you like El, because you spend the entire series inside her head. I loved every moment of it. Novik not infrequently pauses the action to give the reader a page or four of El's internal monologue, and I not only didn't mind, I thought those were the best parts of the book. El is such a deep character: stubborn, thoughtful, sarcastic, impulsive, but also ethical and self-aware in a grudging sort of way that I found utterly compelling to read. And her friends! The friendship dynamics are so great. We sadly don't see as much of Liu in this book (for very good reasons, but I would gladly read an unnecessary sequel novella that existed just to give Liu more time with her friends), but everyone else is here, and in exchange we get much more of Liesel. There should be an Oscar for best supporting character in a novel just so that Liesel can win it. Why are there not more impatient, no-nonsense project managers in fiction? There are a couple of moments between El and Liesel that are among my favorite character interactions in fiction. This is also a series in which the author understands what the characters did in the previous books and the bonds that experience would form, and lets that influence how they interact with the rest of the world. I won't be more specific to avoid spoilers, but the characters worked so hard and were on edge for so long, and I felt like Novik understood the types of relationships that would create in a far deeper and more complex way than most novels. There are several moments in The Golden Enclaves where I paused in reading to admire how perfect the character reactions were, and how striking the contrast was with people who hadn't been through what they went through. The series as a whole is chosen-one fantasy, and if you'd told me that before I read it, I would have grimaced. But this is more evidence (which I should have learned from the romance genre) that tropes, even ones that have been written many times, do not wear out, no matter what critics will try to tell you. There's always room for a great author to pick up the whole idea, turn it sideways, and say "try looking at it from this angle." This is boarding schools, chosen one, and coming of age, with the snarky first-person voice of urban fantasy, and it respects all of those story shapes, is aware of earlier work, and turns them all into something original, often funny, startlingly insightful, and thoroughly engrossing. I am aware that anything I like this much is probably accidentally aimed at my favorite ideas as a reader and my reaction may be partly idiosyncratic. I am not at all objective, and I'm sure not everyone will like it as much as I did. But wow did I ever like this book and this series. Just the best thing I've read in a very, very long time. Highly, highly recommended. (Start at the beginning!) Rating: 10 out of 10

16 October 2022

Aigars Mahinovs: Ryzen 7000 amdgpu boot hang

So you decided to build a brand new system using all the latest and coolest tech, so you buy a Ryzen 7000 series Zen 4 CPU, like the Ryzen 7700X that I picked, with a new mother board and DDR5 memory and all that jazz. But for now, you don't yet have a fitting GPU for that system (as the new ones will only come out in November), so you are booting a Debian system using the new build-in video card of the new CPUs (Zen 4 generation has a simple AMD GPU build-in into every CPU now - great stuff for debugging and mostly-headless systems) and you get ... nothing on the screen. Hmm. You boot into the rescue mode and the kernel message stop after:
Oct 16 13:31:25 home kernel: [    4.128328] amdgpu: Ignoring ACPI CRAT on non-APU system
Oct 16 13:31:25 home kernel: [    4.128329] amdgpu: Virtual CRAT table created for CPU
Oct 16 13:31:25 home kernel: [    4.128332] amdgpu: Topology: Add CPU node
That looks bad, right? Well, if you either ssh into the machine or reboot with module_blacklist=amdgpu in the kernel command line you will find in /var/log/kern.log.1 those messages and also the following messages that will clarify the situation a bit:
Oct 16 13:31:25 home kernel: [    4.129352] amdgpu 0000:10:00.0: firmware: failed to load amdgpu/psp_13_0_5_toc.bin (-2)
Oct 16 13:31:25 home kernel: [    4.129354] firmware_class: See https://wiki.debian.org/Firmware for information about missing firmware
Oct 16 13:31:25 home kernel: [    4.129358] amdgpu 0000:10:00.0: firmware: failed to load amdgpu/psp_13_0_5_toc.bin (-2)
Oct 16 13:31:25 home kernel: [    4.129359] amdgpu 0000:10:00.0: Direct firmware load for amdgpu/psp_13_0_5_toc.bin failed with error -2
Oct 16 13:31:25 home kernel: [    4.129360] amdgpu 0000:10:00.0: amdgpu: fail to request/validate toc microcode
Oct 16 13:31:25 home kernel: [    4.129361] [drm:psp_sw_init [amdgpu]] *ERROR* Failed to load psp firmware!
Oct 16 13:31:25 home kernel: [    4.129432] [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* sw_init of IP block <psp> failed -2
Oct 16 13:31:25 home kernel: [    4.129525] amdgpu 0000:10:00.0: amdgpu: amdgpu_device_ip_init failed
Oct 16 13:31:25 home kernel: [    4.129526] amdgpu 0000:10:00.0: amdgpu: Fatal error during GPU init
Oct 16 13:31:25 home kernel: [    4.129527] amdgpu 0000:10:00.0: amdgpu: amdgpu: finishing device.
Oct 16 13:31:25 home kernel: [    4.129633] amdgpu: probe of 0000:10:00.0 failed with error -2
So what you need is to get a new set of Linux Kernel Firmware blobs and upack that in /lib/firmware. The tarball from 2022-10-12 worked well for me. After that you also need to re-create the initramfs with update-initramfs -k all -c to include the new firmware. Having kernel version 5.18 or newer is also required for stable Zen 4 support. It might be that a fresh Mesa version is also of importance, but as I am running sid on this machine I can only say that Mesa 22.2.1 that is in sid works fine.

30 August 2022

John Goerzen: The PC & Internet Revolution in Rural America

Inspired by several others (such as Alex Schroeder s post and Szcze uja s prompt), as well as a desire to get this down for my kids, I figure it s time to write a bit about living through the PC and Internet revolution where I did: outside a tiny town in rural Kansas. And, as I ve been back in that same area for the past 15 years, I reflect some on the challenges that continue to play out. Although the stories from the others were primarily about getting online, I want to start by setting some background. Those of you that didn t grow up in the same era as I did probably never realized that a typical business PC setup might cost $10,000 in today s dollars, for instance. So let me start with the background.

Nothing was easy This story begins in the 1980s. Somewhere around my Kindergarten year of school, around 1985, my parents bought a TRS-80 Color Computer 2 (aka CoCo II). It had 64K of RAM and used a TV for display and sound. This got you the computer. It didn t get you any disk drive or anything, no joysticks (required by a number of games). So whenever the system powered down, or it hung and you had to power cycle it a frequent event you d lose whatever you were doing and would have to re-enter the program, literally by typing it in. The floppy drive for the CoCo II cost more than the computer, and it was quite common for people to buy the computer first and then the floppy drive later when they d saved up the money for that. I particularly want to mention that computers then didn t come with a modem. What would be like buying a laptop or a tablet without wifi today. A modem, which I ll talk about in a bit, was another expensive accessory. To cobble together a system in the 80s that was capable of talking to others with persistent storage (floppy, or hard drive), screen, keyboard, and modem would be quite expensive. Adjusted for inflation, if you re talking a PC-style device (a clone of the IBM PC that ran DOS), this would easily be more expensive than the Macbook Pros of today. Few people back in the 80s had a computer at home. And the portion of those that had even the capability to get online in a meaningful way was even smaller. Eventually my parents bought a PC clone with 640K RAM and dual floppy drives. This was primarily used for my mom s work, but I did my best to take it over whenever possible. It ran DOS and, despite its monochrome screen, was generally a more capable machine than the CoCo II. For instance, it supported lowercase. (I m not even kidding; the CoCo II pretty much didn t.) A while later, they purchased a 32MB hard drive for it what luxury! Just getting a machine to work wasn t easy. Say you d bought a PC, and then bought a hard drive, and a modem. You didn t just plug in the hard drive and it would work. You would have to fight it every step of the way. The BIOS and DOS partition tables of the day used a cylinder/head/sector method of addressing the drive, and various parts of that those addresses had too few bits to work with the big drives of the day above 20MB. So you would have to lie to the BIOS and fdisk in various ways, and sort of work out how to do it for each drive. For each peripheral serial port, sound card (in later years), etc., you d have to set jumpers for DMA and IRQs, hoping not to conflict with anything already in the system. Perhaps you can now start to see why USB and PCI were so welcomed.

Sharing and finding resources Despite the two computers in our home, it wasn t as if software written on one machine just ran on another. A lot of software for PC clones assumed a CGA color display. The monochrome HGC in our PC wasn t particularly compatible. You could find a TSR program to emulate the CGA on the HGC, but it wasn t particularly stable, and there s only so much you can do when a program that assumes color displays on a monitor that can only show black, dark amber, or light amber. So I d periodically get to use other computers most commonly at an office in the evening when it wasn t being used. There were some local computer clubs that my dad took me to periodically. Software was swapped back then; disks copied, shareware exchanged, and so forth. For me, at least, there was no online to download software from, and selling software over the Internet wasn t a thing at all.

Three Different Worlds There were sort of three different worlds of computing experience in the 80s:
  1. Home users. Initially using a wide variety of software from Apple, Commodore, Tandy/RadioShack, etc., but eventually coming to be mostly dominated by IBM PC clones
  2. Small and mid-sized business users. Some of them had larger minicomputers or small mainframes, but most that I had contact with by the early 90s were standardized on DOS-based PCs. More advanced ones had a network running Netware, most commonly. Networking hardware and software was generally too expensive for home users to use in the early days.
  3. Universities and large institutions. These are the places that had the mainframes, the earliest implementations of TCP/IP, the earliest users of UUCP, and so forth.
The difference between the home computing experience and the large institution experience were vast. Not only in terms of dollars the large institution hardware could easily cost anywhere from tens of thousands to millions of dollars but also in terms of sheer resources required (large rooms, enormous power circuits, support staff, etc). Nothing was in common between them; not operating systems, not software, not experience. I was never much aware of the third category until the differences started to collapse in the mid-90s, and even then I only was exposed to it once the collapse was well underway. You might say to me, Well, Google certainly isn t running what I m running at home! And, yes of course, it s different. But fundamentally, most large datacenters are running on x86_64 hardware, with Linux as the operating system, and a TCP/IP network. It s a different scale, obviously, but at a fundamental level, the hardware and operating system stack are pretty similar to what you can readily run at home. Back in the 80s and 90s, this wasn t the case. TCP/IP wasn t even available for DOS or Windows until much later, and when it was, it was a clunky beast that was difficult. One of the things Kevin Driscoll highlights in his book called Modem World see my short post about it is that the history of the Internet we usually receive is focused on case 3: the large institutions. In reality, the Internet was and is literally a network of networks. Gateways to and from Internet existed from all three kinds of users for years, and while TCP/IP ultimately won the battle of the internetworking protocol, the other two streams of users also shaped the Internet as we now know it. Like many, I had no access to the large institution networks, but as I ve been reflecting on my experiences, I ve found a new appreciation for the way that those of us that grew up with primarily home PCs shaped the evolution of today s online world also.

An Era of Scarcity I should take a moment to comment about the cost of software back then. A newspaper article from 1985 comments that WordPerfect, then the most powerful word processing program, sold for $495 (or $219 if you could score a mail order discount). That s $1360/$600 in 2022 money. Other popular software, such as Lotus 1-2-3, was up there as well. If you were to buy a new PC clone in the mid to late 80s, it would often cost $2000 in 1980s dollars. Now add a printer a low-end dot matrix for $300 or a laser for $1500 or even more. A modem: another $300. So the basic system would be $3600, or $9900 in 2022 dollars. If you wanted a nice printer, you re now pushing well over $10,000 in 2022 dollars. You start to see one barrier here, and also why things like shareware and piracy if it was indeed even recognized as such were common in those days. So you can see, from a home computer setup (TRS-80, Commodore C64, Apple ][, etc) to a business-class PC setup was an order of magnitude increase in cost. From there to the high-end minis/mainframes was another order of magnitude (at least!) increase. Eventually there was price pressure on the higher end and things all got better, which is probably why the non-DOS PCs lasted until the early 90s.

Increasing Capabilities My first exposure to computers in school was in the 4th grade, when I would have been about 9. There was a single Apple ][ machine in that room. I primarily remember playing Oregon Trail on it. The next year, the school added a computer lab. Remember, this is a small rural area, so each graduating class might have about 25 people in it; this lab was shared by everyone in the K-8 building. It was full of some flavor of IBM PS/2 machines running DOS and Netware. There was a dedicated computer teacher too, though I think she was a regular teacher that was given somewhat minimal training on computers. We were going to learn typing that year, but I did so well on the very first typing program that we soon worked out that I could do programming instead. I started going to school early these machines were far more powerful than the XT at home and worked on programming projects there. Eventually my parents bought me a Gateway 486SX/25 with a VGA monitor and hard drive. Wow! This was a whole different world. It may have come with Windows 3.0 or 3.1 on it, but I mainly remember running OS/2 on that machine. More on that below.

Programming That CoCo II came with a BASIC interpreter in ROM. It came with a large manual, which served as a BASIC tutorial as well. The BASIC interpreter was also the shell, so literally you could not use the computer without at least a bit of BASIC. Once I had access to a DOS machine, it also had a basic interpreter: GW-BASIC. There was a fair bit of software written in BASIC at the time, but most of the more advanced software wasn t. I wondered how these .EXE and .COM programs were written. I could find vague references to DEBUG.EXE, assemblers, and such. But it wasn t until I got a copy of Turbo Pascal that I was able to do that sort of thing myself. Eventually I got Borland C++ and taught myself C as well. A few years later, I wanted to try writing GUI programs for Windows, and bought Watcom C++ much cheaper than the competition, and it could target Windows, DOS (and I think even OS/2). Notice that, aside from BASIC, none of this was free, and none of it was bundled. You couldn t just download a C compiler, or Python interpreter, or whatnot back then. You had to pay for the ability to write any kind of serious code on the computer you already owned.

The Microsoft Domination Microsoft came to dominate the PC landscape, and then even the computing landscape as a whole. IBM very quickly lost control over the hardware side of PCs as Compaq and others made clones, but Microsoft has managed in varying degrees even to this day to keep a stranglehold on the software, and especially the operating system, side. Yes, there was occasional talk of things like DR-DOS, but by and large the dominant platform came to be the PC, and if you had a PC, you ran DOS (and later Windows) from Microsoft. For awhile, it looked like IBM was going to challenge Microsoft on the operating system front; they had OS/2, and when I switched to it sometime around the version 2.1 era in 1993, it was unquestionably more advanced technically than the consumer-grade Windows from Microsoft at the time. It had Internet support baked in, could run most DOS and Windows programs, and had introduced a replacement for the by-then terrible FAT filesystem: HPFS, in 1988. Microsoft wouldn t introduce a better filesystem for its consumer operating systems until Windows XP in 2001, 13 years later. But more on that story later.

Free Software, Shareware, and Commercial Software I ve covered the high cost of software already. Obviously $500 software wasn t going to sell in the home market. So what did we have? Mainly, these things:
  1. Public domain software. It was free to use, and if implemented in BASIC, probably had source code with it too.
  2. Shareware
  3. Commercial software (some of it from small publishers was a lot cheaper than $500)
Let s talk about shareware. The idea with shareware was that a company would release a useful program, sometimes limited. You were encouraged to register , or pay for, it if you liked it and used it. And, regardless of whether you registered it or not, were told please copy! Sometimes shareware was fully functional, and registering it got you nothing more than printed manuals and an easy conscience (guilt trips for not registering weren t necessarily very subtle). Sometimes unregistered shareware would have a nag screen a delay of a few seconds while they told you to register. Sometimes they d be limited in some way; you d get more features if you registered. With games, it was popular to have a trilogy, and release the first episode inevitably ending with a cliffhanger as shareware, and the subsequent episodes would require registration. In any event, a lot of software people used in the 80s and 90s was shareware. Also pirated commercial software, though in the earlier days of computing, I think some people didn t even know the difference. Notice what s missing: Free Software / FLOSS in the Richard Stallman sense of the word. Stallman lived in the big institution world after all, he worked at MIT and what he was doing with the Free Software Foundation and GNU project beginning in 1983 never really filtered into the DOS/Windows world at the time. I had no awareness of it even existing until into the 90s, when I first started getting some hints of it as a port of gcc became available for OS/2. The Internet was what really brought this home, but I m getting ahead of myself. I want to say again: FLOSS never really entered the DOS and Windows 3.x ecosystems. You d see it make a few inroads here and there in later versions of Windows, and moreso now that Microsoft has been sort of forced to accept it, but still, reflect on its legacy. What is the software market like in Windows compared to Linux, even today? Now it is, finally, time to talk about connectivity!

Getting On-Line What does it even mean to get on line? Certainly not connecting to a wifi access point. The answer is, unsurprisingly, complex. But for everyone except the large institutional users, it begins with a telephone.

The telephone system By the 80s, there was one communication network that already reached into nearly every home in America: the phone system. Virtually every household (note I don t say every person) was uniquely identified by a 10-digit phone number. You could, at least in theory, call up virtually any other phone in the country and be connected in less than a minute. But I ve got to talk about cost. The way things worked in the USA, you paid a monthly fee for a phone line. Included in that monthly fee was unlimited local calling. What is a local call? That was an extremely complex question. Generally it meant, roughly, calling within your city. But of course, as you deal with things like suburbs and cities growing into each other (eg, the Dallas-Ft. Worth metroplex), things got complicated fast. But let s just say for simplicity you could call others in your city. What about calling people not in your city? That was long distance , and you paid often hugely by the minute for it. Long distance rates were difficult to figure out, but were generally most expensive during business hours and cheapest at night or on weekends. Prices eventually started to come down when competition was introduced for long distance carriers, but even then you often were stuck with a single carrier for long distance calls outside your city but within your state. Anyhow, let s just leave it at this: local calls were virtually free, and long distance calls were extremely expensive.

Getting a modem I remember getting a modem that ran at either 1200bps or 2400bps. Either way, quite slow; you could often read even plain text faster than the modem could display it. But what was a modem? A modem hooked up to a computer with a serial cable, and to the phone system. By the time I got one, modems could automatically dial and answer. You would send a command like ATDT5551212 and it would dial 555-1212. Modems had speakers, because often things wouldn t work right, and the telephone system was oriented around speech, so you could hear what was happening. You d hear it wait for dial tone, then dial, then hopefully the remote end would ring, a modem there would answer, you d hear the screeching of a handshake, and eventually your terminal would say CONNECT 2400. Now your computer was bridged to the other; anything going out your serial port was encoded as sound by your modem and decoded at the other end, and vice-versa. But what, exactly, was the other end? It might have been another person at their computer. Turn on local echo, and you can see what they did. Maybe you d send files to each other. But in my case, the answer was different: PC Magazine.

PC Magazine and CompuServe Starting around 1986 (so I would have been about 6 years old), I got to read PC Magazine. My dad would bring copies that were being discarded at his office home for me to read, and I think eventually bought me a subscription directly. This was not just a standard magazine; it ran something like 350-400 pages an issue, and came out every other week. This thing was a monster. It had reviews of hardware and software, descriptions of upcoming technologies, pages and pages of ads (that often had some degree of being informative to them). And they had sections on programming. Many issues would talk about BASIC or Pascal programming, and there d be a utility in most issues. What do I mean by a utility in most issues ? Did they include a floppy disk with software? No, of course not. There was a literal program listing printed in the magazine. If you wanted the utility, you had to type it in. And a lot of them were written in assembler, so you had to have an assembler. An assembler, of course, was not free and I didn t have one. Or maybe they wrote it in Microsoft C, and I had Borland C, and (of course) they weren t compatible. Sometimes they would list the program sort of in binary: line after line of a BASIC program, with lines like 64, 193, 253, 0, 53, 0, 87 that you would type in for hours, hopefully correctly. Running the BASIC program would, if you got it correct, emit a .COM file that you could then run. They did have a rudimentary checksum system built in, but it wasn t even a CRC, so something like swapping two numbers you d never notice except when the program would mysteriously hang. Eventually they teamed up with CompuServe to offer a limited slice of CompuServe for the purpose of downloading PC Magazine utilities. This was called PC MagNet. I am foggy on the details, but I believe that for a time you could connect to the limited PC MagNet part of CompuServe for free (after the cost of the long-distance call, that is) rather than paying for CompuServe itself (because, OF COURSE, that also charged you per the minute.) So in the early days, I would get special permission from my parents to place a long distance call, and after some nerve-wracking minutes in which we were aware every minute was racking up charges, I could navigate the menus, download what I wanted, and log off immediately. I still, incidentally, mourn what PC Magazine became. As with computing generally, it followed the mass market. It lost its deep technical chops, cut its programming columns, stopped talking about things like how SCSI worked, and so forth. By the time it stopped printing in 2009, it was no longer a square-bound 400-page beheamoth, but rather looked more like a copy of Newsweek, but with less depth.

Continuing with CompuServe CompuServe was a much larger service than just PC MagNet. Eventually, our family got a subscription. It was still an expensive and scarce resource; I d call it only after hours when the long-distance rates were cheapest. Everyone had a numerical username separated by commas; mine was 71510,1421. CompuServe had forums, and files. Eventually I would use TapCIS to queue up things I wanted to do offline, to minimize phone usage online. CompuServe eventually added a gateway to the Internet. For the sum of somewhere around $1 a message, you could send or receive an email from someone with an Internet email address! I remember the thrill of one time, as a kid of probably 11 years, sending a message to one of the editors of PC Magazine and getting a kind, if brief, reply back! But inevitably I had

The Godzilla Phone Bill Yes, one month I became lax in tracking my time online. I ran up my parents phone bill. I don t remember how high, but I remember it was hundreds of dollars, a hefty sum at the time. As I watched Jason Scott s BBS Documentary, I realized how common an experience this was. I think this was the end of CompuServe for me for awhile.

Toll-Free Numbers I lived near a town with a population of 500. Not even IN town, but near town. The calling area included another town with a population of maybe 1500, so all told, there were maybe 2000 people total I could talk to with a local call though far fewer numbers, because remember, telephones were allocated by the household. There was, as far as I know, zero modems that were a local call (aside from one that belonged to a friend I met in around 1992). So basically everything was long-distance. But there was a special feature of the telephone network: toll-free numbers. Normally when calling long-distance, you, the caller, paid the bill. But with a toll-free number, beginning with 1-800, the recipient paid the bill. These numbers almost inevitably belonged to corporations that wanted to make it easy for people to call. Sales and ordering lines, for instance. Some of these companies started to set up modems on toll-free numbers. There were few of these, but they existed, so of course I had to try them! One of them was a company called PennyWise that sold office supplies. They had a toll-free line you could call with a modem to order stuff. Yes, online ordering before the web! I loved office supplies. And, because I lived far from a big city, if the local K-Mart didn t have it, I probably couldn t get it. Of course, the interface was entirely text, but you could search for products and place orders with the modem. I had loads of fun exploring the system, and actually ordered things from them and probably actually saved money doing so. With the first order they shipped a monster full-color catalog. That thing must have been 500 pages, like the Sears catalogs of the day. Every item had a part number, which streamlined ordering through the modem.

Inbound FAXes By the 90s, a number of modems became able to send and receive FAXes as well. For those that don t know, a FAX machine was essentially a special modem. It would scan a page and digitally transmit it over the phone system, where it would at least in the early days be printed out in real time (because the machines didn t have the memory to store an entire page as an image). Eventually, PC modems integrated FAX capabilities. There still wasn t anything useful I could do locally, but there were ways I could get other companies to FAX something to me. I remember two of them. One was for US Robotics. They had an on demand FAX system. You d call up a toll-free number, which was an automated IVR system. You could navigate through it and select various documents of interest to you: spec sheets and the like. You d key in your FAX number, hang up, and US Robotics would call YOU and FAX you the documents you wanted. Yes! I was talking to a computer (of a sorts) at no cost to me! The New York Times also ran a service for awhile called TimesFax. Every day, they would FAX out a page or two of summaries of the day s top stories. This was pretty cool in an era in which I had no other way to access anything from the New York Times. I managed to sign up for TimesFax I have no idea how, anymore and for awhile I would get a daily FAX of their top stories. When my family got its first laser printer, I could them even print these FAXes complete with the gothic New York Times masthead. Wow! (OK, so technically I could print it on a dot-matrix printer also, but graphics on a 9-pin dot matrix is a kind of pain that is a whole other article.)

My own phone line Remember how I discussed that phone lines were allocated per household? This was a problem for a lot of reasons:
  1. Anybody that tried to call my family while I was using my modem would get a busy signal (unable to complete the call)
  2. If anybody in the house picked up the phone while I was using it, that would degrade the quality of the ongoing call and either mess up or disconnect the call in progress. In many cases, that could cancel a file transfer (which wasn t necessarily easy or possible to resume), prompting howls of annoyance from me.
  3. Generally we all had to work around each other
So eventually I found various small jobs and used the money I made to pay for my own phone line and my own long distance costs. Eventually I upgraded to a 28.8Kbps US Robotics Courier modem even! Yes, you heard it right: I got a job and a bank account so I could have a phone line and a faster modem. Uh, isn t that why every teenager gets a job? Now my local friend and I could call each other freely at least on my end (I can t remember if he had his own phone line too). We could exchange files using HS/Link, which had the added benefit of allowing split-screen chat even while a file transfer is in progress. I m sure we spent hours chatting to each other keyboard-to-keyboard while sharing files with each other.

Technology in Schools By this point in the story, we re in the late 80s and early 90s. I m still using PC-style OSs at home; OS/2 in the later years of this period, DOS or maybe a bit of Windows in the earlier years. I mentioned that they let me work on programming at school starting in 5th grade. It was soon apparent that I knew more about computers than anybody on staff, and I started getting pulled out of class to help teachers or administrators with vexing school problems. This continued until I graduated from high school, incidentally often to my enjoyment, and the annoyance of one particular teacher who, I must say, I was fine with annoying in this way. That s not to say that there was institutional support for what I was doing. It was, after all, a small school. Larger schools might have introduced BASIC or maybe Logo in high school. But I had already taught myself BASIC, Pascal, and C by the time I was somewhere around 12 years old. So I wouldn t have had any use for that anyhow. There were programming contests occasionally held in the area. Schools would send teams. My school didn t really send anybody, but I went as an individual. One of them was run by a local college (but for jr. high or high school students. Years later, I met one of the professors that ran it. He remembered me, and that day, better than I did. The programming contest had problems one could solve in BASIC or Logo. I knew nothing about what to expect going into it, but I had lugged my computer and screen along, and asked him, Can I write my solutions in C? He was, apparently, stunned, but said sure, go for it. I took first place that day, leading to some rather confused teams from much larger schools. The Netware network that the school had was, as these generally were, itself isolated. There was no link to the Internet or anything like it. Several schools across three local counties eventually invested in a fiber-optic network linking them together. This built a larger, but still closed, network. Its primary purpose was to allow students to be exposed to a wider variety of classes at high schools. Participating schools had an ITV room , outfitted with cameras and mics. So students at any school could take classes offered over ITV at other schools. For instance, only my school taught German classes, so people at any of those participating schools could take German. It was an early Zoom room. But alongside the TV signal, there was enough bandwidth to run some Netware frames. By about 1995 or so, this let one of the schools purchase some CD-ROM software that was made available on a file server and could be accessed by any participating school. Nice! But Netware was mainly about file and printer sharing; there wasn t even a facility like email, at least not on our deployment.

BBSs My last hop before the Internet was the BBS. A BBS was a computer program, usually ran by a hobbyist like me, on a computer with a modem connected. Callers would call it up, and they d interact with the BBS. Most BBSs had discussion groups like forums and file areas. Some also had games. I, of course, continued to have that most vexing of problems: they were all long-distance. There were some ways to help with that, chiefly QWK and BlueWave. These, somewhat like TapCIS in the CompuServe days, let me download new message posts for reading offline, and queue up my own messages to send later. QWK and BlueWave didn t help with file downloading, though.

BBSs get networked BBSs were an interesting thing. You d call up one, and inevitably somewhere in the file area would be a BBS list. Download the BBS list and you ve suddenly got a list of phone numbers to try calling. All of them were long distance, of course. You d try calling them at random and have a success rate of maybe 20%. The other 80% would be defunct; you might get the dreaded this number is no longer in service or the even more dreaded angry human answering the phone (and of course a modem can t talk to a human, so they d just get silence for probably the nth time that week). The phone company cared nothing about BBSs and recycled their numbers just as fast as any others. To talk to various people, or participate in certain discussion groups, you d have to call specific BBSs. That s annoying enough in the general case, but even more so for someone paying long distance for it all, because it takes a few minutes to establish a connection to a BBS: handshaking, logging in, menu navigation, etc. But BBSs started talking to each other. The earliest successful such effort was FidoNet, and for the duration of the BBS era, it remained by far the largest. FidoNet was analogous to the UUCP that the institutional users had, but ran on the much cheaper PC hardware. Basically, BBSs that participated in FidoNet would relay email, forum posts, and files between themselves overnight. Eventually, as with UUCP, by hopping through this network, messages could reach around the globe, and forums could have worldwide participation asynchronously, long before they could link to each other directly via the Internet. It was almost entirely volunteer-run.

Running my own BBS At age 13, I eventually chose to set up my own BBS. It ran on my single phone line, so of course when I was dialing up something else, nobody could dial up me. Not that this was a huge problem; in my town of 500, I probably had a good 1 or 2 regular callers in the beginning. In the PC era, there was a big difference between a server and a client. Server-class software was expensive and rare. Maybe in later years you had an email client, but an email server would be completely unavailable to you as a home user. But with a BBS, I could effectively run a server. I even ran serial lines in our house so that the BBS could be connected from other rooms! Since I was running OS/2, the BBS didn t tie up the computer; I could continue using it for other things. FidoNet had an Internet email gateway. This one, unlike CompuServe s, was free. Once I had a BBS on FidoNet, you could reach me from the Internet using the FidoNet address. This didn t support attachments, but then email of the day didn t really, either. Various others outside Kansas ran FidoNet distribution points. I believe one of them was mgmtsys; my memory is quite vague, but I think they offered a direct gateway and I would call them to pick up Internet mail via FidoNet protocols, but I m not at all certain of this.

Pros and Cons of the Non-Microsoft World As mentioned, Microsoft was and is the dominant operating system vendor for PCs. But I left that world in 1993, and here, nearly 30 years later, have never really returned. I got an operating system with more technical capabilities than the DOS and Windows of the day, but the tradeoff was a much smaller software ecosystem. OS/2 could run DOS programs, but it ran OS/2 programs a lot better. So if I were to run a BBS, I wanted one that had a native OS/2 version limiting me to a small fraction of available BBS server software. On the other hand, as a fully 32-bit operating system, there started to be OS/2 ports of certain software with a Unix heritage; most notably for me at the time, gcc. At some point, I eventually came across the RMS essays and started to be hooked.

Internet: The Hunt Begins I certainly was aware that the Internet was out there and interesting. But the first problem was: how the heck do I get connected to the Internet?

Computer labs There was one place that tended to have Internet access: colleges and universities. In 7th grade, I participated in a program that resulted in me being invited to visit Duke University, and in 8th grade, I participated in National History Day, resulting in a trip to visit the University of Maryland. I probably sought out computer labs at both of those. My most distinct memory was finding my way into a computer lab at one of those universities, and it was full of NeXT workstations. I had never seen or used NeXT before, and had no idea how to operate it. I had brought a box of floppy disks, unaware that the DOS disks probably weren t compatible with NeXT. Closer to home, a small college had a computer lab that I could also visit. I would go there in summer or when it wasn t used with my stack of floppies. I remember downloading disk images of FLOSS operating systems: FreeBSD, Slackware, or Debian, at the time. The hash marks from the DOS-based FTP client would creep across the screen as the 1.44MB disk images would slowly download. telnet was also available on those machines, so I could telnet to things like public-access Archie servers and libraries though not Gopher. Still, FTP and telnet access opened up a lot, and I learned quite a bit in those years.

Continuing the Journey At some point, I got a copy of the Whole Internet User s Guide and Catalog, published in 1994. I still have it. If it hadn t already figured it out by then, I certainly became aware from it that Unix was the dominant operating system on the Internet. The examples in Whole Internet covered FTP, telnet, gopher all assuming the user somehow got to a Unix prompt. The web was introduced about 300 pages in; clearly viewed as something that wasn t page 1 material. And it covered the command-line www client before introducing the graphical Mosaic. Even then, though, the book highlighted Mosaic s utility as a front-end for Gopher and FTP, and even the ability to launch telnet sessions by clicking on links. But having a copy of the book didn t equate to having any way to run Mosaic. The machines in the computer lab I mentioned above all ran DOS and were incapable of running a graphical browser. I had no SLIP or PPP (both ways to run Internet traffic over a modem) connectivity at home. In short, the Web was something for the large institutional users at the time.

CD-ROMs As CD-ROMs came out, with their huge (for the day) 650MB capacity, various companies started collecting software that could be downloaded on the Internet and selling it on CD-ROM. The two most popular ones were Walnut Creek CD-ROM and Infomagic. One could buy extensive Shareware and gaming collections, and then even entire Linux and BSD distributions. Although not exactly an Internet service per se, it was a way of bringing what may ordinarily only be accessible to institutional users into the home computer realm.

Free Software Jumps In As I mentioned, by the mid 90s, I had come across RMS s writings about free software most probably his 1992 essay Why Software Should Be Free. (Please note, this is not a commentary on the more recently-revealed issues surrounding RMS, but rather his writings and work as I encountered them in the 90s.) The notion of a Free operating system not just in cost but in openness was incredibly appealing. Not only could I tinker with it to a much greater extent due to having source for everything, but it included so much software that I d otherwise have to pay for. Compilers! Interpreters! Editors! Terminal emulators! And, especially, server software of all sorts. There d be no way I could afford or run Netware, but with a Free Unixy operating system, I could do all that. My interest was obviously piqued. Add to that the fact that I could actually participate and contribute I was about to become hooked on something that I ve stayed hooked on for decades. But then the question was: which Free operating system? Eventually I chose FreeBSD to begin with; that would have been sometime in 1995. I don t recall the exact reasons for that. I remember downloading Slackware install floppies, and probably the fact that Debian wasn t yet at 1.0 scared me off for a time. FreeBSD s fantastic Handbook far better than anything I could find for Linux at the time was no doubt also a factor.

The de Raadt Factor Why not NetBSD or OpenBSD? The short answer is Theo de Raadt. Somewhere in this time, when I was somewhere between 14 and 16 years old, I asked some questions comparing NetBSD to the other two free BSDs. This was on a NetBSD mailing list, but for some reason Theo saw it and got a flame war going, which CC d me. Now keep in mind that even if NetBSD had a web presence at the time, it would have been minimal, and I would have not all that unusually for the time had no way to access it. I was certainly not aware of the, shall we say, acrimony between Theo and NetBSD. While I had certainly seen an online flamewar before, this took on a different and more disturbing tone; months later, Theo randomly emailed me under the subject SLIME saying that I was, well, SLIME . I seem to recall periodic emails from him thereafter reminding me that he hates me and that he had blocked me. (Disclaimer: I have poor email archives from this period, so the full details are lost to me, but I believe I am accurately conveying these events from over 25 years ago) This was a surprise, and an unpleasant one. I was trying to learn, and while it is possible I didn t understand some aspect or other of netiquette (or Theo s personal hatred of NetBSD) at the time, still that is not a reason to flame a 16-year-old (though he would have had no way to know my age). This didn t leave any kind of scar, but did leave a lasting impression; to this day, I am particularly concerned with how FLOSS projects handle poisonous people. Debian, for instance, has come a long way in this over the years, and even Linus Torvalds has turned over a new leaf. I don t know if Theo has. In any case, I didn t use NetBSD then. I did try it periodically in the years since, but never found it compelling enough to justify a large switch from Debian. I never tried OpenBSD for various reasons, but one of them was that I didn t want to join a community that tolerates behavior such as Theo s from its leader.

Moving to FreeBSD Moving from OS/2 to FreeBSD was final. That is, I didn t have enough hard drive space to keep both. I also didn t have the backup capacity to back up OS/2 completely. My BBS, which ran Virtual BBS (and at some point also AdeptXBBS) was deleted and reincarnated in a different form. My BBS was a member of both FidoNet and VirtualNet; the latter was specific to VBBS, and had to be dropped. I believe I may have also had to drop the FidoNet link for a time. This was the biggest change of computing in my life to that point. The earlier experiences hadn t literally destroyed what came before. OS/2 could still run my DOS programs. Its command shell was quite DOS-like. It ran Windows programs. I was going to throw all that away and leap into the unknown. I wish I had saved a copy of my BBS; I would love to see the messages I exchanged back then, or see its menu screens again. I have little memory of what it looked like. But other than that, I have no regrets. Pursuing Free, Unixy operating systems brought me a lot of enjoyment and a good career. That s not to say it was easy. All the problems of not being in the Microsoft ecosystem were magnified under FreeBSD and Linux. In a day before EDID, monitor timings had to be calculated manually and you risked destroying your monitor if you got them wrong. Word processing and spreadsheet software was pretty much not there for FreeBSD or Linux at the time; I was therefore forced to learn LaTeX and actually appreciated that. Software like PageMaker or CorelDraw was certainly nowhere to be found for those free operating systems either. But I got a ton of new capabilities. I mentioned the BBS didn t shut down, and indeed it didn t. I ran what was surely a supremely unique oddity: a free, dialin Unix shell server in the middle of a small town in Kansas. I m sure I provided things such as pine for email and some help text and maybe even printouts for how to use it. The set of callers slowly grew over the time period, in fact. And then I got UUCP.

Enter UUCP Even throughout all this, there was no local Internet provider and things were still long distance. I had Internet Email access via assorted strange routes, but they were all strange. And, I wanted access to Usenet. In 1995, it happened. The local ISP I mentioned offered UUCP access. Though I couldn t afford the dialup shell (or later, SLIP/PPP) that they offered due to long-distance costs, UUCP s very efficient batched processes looked doable. I believe I established that link when I was 15, so in 1995. I worked to register my domain, complete.org, as well. At the time, the process was a bit lengthy and involved downloading a text file form, filling it out in a precise way, sending it to InterNIC, and probably mailing them a check. Well I did that, and in September of 1995, complete.org became mine. I set up sendmail on my local system, as well as INN to handle the limited Usenet newsfeed I requested from the ISP. I even ran Majordomo to host some mailing lists, including some that were surprisingly high-traffic for a few-times-a-day long-distance modem UUCP link! The modem client programs for FreeBSD were somewhat less advanced than for OS/2, but I believe I wound up using Minicom or Seyon to continue to dial out to BBSs and, I believe, continue to use Learning Link. So all the while I was setting up my local BBS, I continued to have access to the text Internet, consisting of chiefly Gopher for me.

Switching to Debian I switched to Debian sometime in 1995 or 1996, and have been using Debian as my primary OS ever since. I continued to offer shell access, but added the WorldVU Atlantis menuing BBS system. This provided a return of a more BBS-like interface (by default; shell was still an uption) as well as some BBS door games such as LoRD and TradeWars 2002, running under DOS emulation. I also continued to run INN, and ran ifgate to allow FidoNet echomail to be presented into INN Usenet-like newsgroups, and netmail to be gated to Unix email. This worked pretty well. The BBS continued to grow in these days, peaking at about two dozen total user accounts, and maybe a dozen regular users.

Dial-up access availability I believe it was in 1996 that dial up PPP access finally became available in my small town. What a thrill! FINALLY! I could now FTP, use Gopher, telnet, and the web all from home. Of course, it was at modem speeds, but still. (Strangely, I have a memory of accessing the Web using WebExplorer from OS/2. I don t know exactly why; it s possible that by this time, I had upgraded to a 486 DX2/66 and was able to reinstall OS/2 on the old 25MHz 486, or maybe something was wrong with the timeline from my memories from 25 years ago above. Or perhaps I made the occasional long-distance call somewhere before I ditched OS/2.) Gopher sites still existed at this point, and I could access them using Netscape Navigator which likely became my standard Gopher client at that point. I don t recall using UMN text-mode gopher client locally at that time, though it s certainly possible I did.

The city Starting when I was 15, I took computer science classes at Wichita State University. The first one was a class in the summer of 1995 on C++. I remember being worried about being good enough for it I was, after all, just after my HS freshman year and had never taken the prerequisite C class. I loved it and got an A! By 1996, I was taking more classes. In 1996 or 1997 I stayed in Wichita during the day due to having more than one class. So, what would I do then but enjoy the computer lab? The CS dept. had two of them: one that had NCD X terminals connected to a pair of SunOS servers, and another one running Windows. I spent most of the time in the Unix lab with the NCDs; I d use Netscape or pine, write code, enjoy the University s fast Internet connection, and so forth. In 1997 I had graduated high school and that summer I moved to Wichita to attend college. As was so often the case, I shut down the BBS at that time. It would be 5 years until I again dealt with Internet at home in a rural community. By the time I moved to my apartment in Wichita, I had stopped using OS/2 entirely. I have no memory of ever having OS/2 there. Along the way, I had bought a Pentium 166, and then the most expensive piece of computing equipment I have ever owned: a DEC Alpha, which, of course, ran Linux.

ISDN I must have used dialup PPP for a time, but I eventually got a job working for the ISP I had used for UUCP, and then PPP. While there, I got a 128Kbps ISDN line installed in my apartment, and they gave me a discount on the service for it. That was around 3x the speed of a modem, and crucially was always on and gave me a public IP. No longer did I have to use UUCP; now I got to host my own things! By at least 1998, I was running a web server on www.complete.org, and I had an FTP server going as well.

Even Bigger Cities In 1999 I moved to Dallas, and there got my first broadband connection: an ADSL link at, I think, 1.5Mbps! Now that was something! But it had some reliability problems. I eventually put together a server and had it hosted at an acquantaince s place who had SDSL in his apartment. Within a couple of years, I had switched to various kinds of proper hosting for it, but that is a whole other article. In Indianapolis, I got a cable modem for the first time, with even tighter speeds but prohibitions on running servers on it. Yuck.

Challenges Being non-Microsoft continued to have challenges. Until the advent of Firefox, a web browser was one of the biggest. While Netscape supported Linux on i386, it didn t support Linux on Alpha. I hobbled along with various attempts at emulators, old versions of Mosaic, and so forth. And, until StarOffice was open-sourced as Open Office, reading Microsoft file formats was also a challenge, though WordPerfect was briefly available for Linux. Over the years, I have become used to the Linux ecosystem. Perhaps I use Gimp instead of Photoshop and digikam instead of well, whatever somebody would use on Windows. But I get ZFS, and containers, and so much that isn t available there. Yes, I know Apple never went away and is a thing, but for most of the time period I discuss in this article, at least after the rise of DOS, it was niche compared to the PC market.

Back to Kansas In 2002, I moved back to Kansas, to a rural home near a different small town in the county next to where I grew up. Over there, it was back to dialup at home, but I had faster access at work. I didn t much care for this, and thus began a 20+-year effort to get broadband in the country. At first, I got a wireless link, which worked well enough in the winter, but had serious problems in the summer when the trees leafed out. Eventually DSL became available locally highly unreliable, but still, it was something. Then I moved back to the community I grew up in, a few miles from where I grew up. Again I got DSL a bit better. But after some years, being at the end of the run of DSL meant I had poor speeds and reliability problems. I eventually switched to various wireless ISPs, which continues to the present day; while people in cities can get Gbps service, I can get, at best, about 50Mbps. Long-distance fees are gone, but the speed disparity remains.

Concluding Reflections I am glad I grew up where I did; the strong community has a lot of advantages I don t have room to discuss here. In a number of very real senses, having no local services made things a lot more difficult than they otherwise would have been. However, perhaps I could say that I also learned a lot through the need to come up with inventive solutions to those challenges. To this day, I think a lot about computing in remote environments: partially because I live in one, and partially because I enjoy visiting places that are remote enough that they have no Internet, phone, or cell service whatsoever. I have written articles like Tools for Communicating Offline and in Difficult Circumstances based on my own personal experience. I instinctively think about making protocols robust in the face of various kinds of connectivity failures because I experience various kinds of connectivity failures myself.

(Almost) Everything Lives On In 2002, Gopher turned 10 years old. It had probably been about 9 or 10 years since I had first used Gopher, which was the first way I got on live Internet from my house. It was hard to believe. By that point, I had an always-on Internet link at home and at work. I had my Alpha, and probably also at least PCMCIA Ethernet for a laptop (many laptops had modems by the 90s also). Despite its popularity in the early 90s, less than 10 years after it came on the scene and started to unify the Internet, it was mostly forgotten. And it was at that moment that I decided to try to resurrect it. The University of Minnesota finally released it under an Open Source license. I wrote the first new gopher server in years, pygopherd, and introduced gopher to Debian. Gopher lives on; there are now quite a few Gopher clients and servers out there, newly started post-2002. The Gemini protocol can be thought of as something akin to Gopher 2.0, and it too has a small but blossoming ecosystem. Archie, the old FTP search tool, is dead though. Same for WAIS and a number of the other pre-web search tools. But still, even FTP lives on today. And BBSs? Well, they didn t go away either. Jason Scott s fabulous BBS documentary looks back at the history of the BBS, while Back to the BBS from last year talks about the modern BBS scene. FidoNet somehow is still alive and kicking. UUCP still has its place and has inspired a whole string of successors. Some, like NNCP, are clearly direct descendents of UUCP. Filespooler lives in that ecosystem, and you can even see UUCP concepts in projects as far afield as Syncthing and Meshtastic. Usenet still exists, and you can now run Usenet over NNCP just as I ran Usenet over UUCP back in the day (which you can still do as well). Telnet, of course, has been largely supplanted by ssh, but the concept is more popular now than ever, as Linux has made ssh be available on everything from Raspberry Pi to Android. And I still run a Gopher server, looking pretty much like it did in 2002. This post also has a permanent home on my website, where it may be periodically updated.

12 August 2022

Shirish Agarwal: Mum and Books

The last day
The first lesson I would like everybody to know and have is to buy two machines, especially a machine to check low blood pressure. I had actually ordered one from Amazon but they never delivered. I hope to sue them in consumer court in due course of time. The other one is a blood sugar machine which I ordered and did get, but the former is more important than the latter, and the reason why will be known soon. Mum had stopped eating solids and was entirely on liquids for the last month of her life. I did try enticing her however I could with aromatic food but failed. Add to that we had weird weather this entire year. June is supposed to be when the weather turns and we have gentle showers, but this whole June it felt like we were in an oven. She asked for liquids whenever and although I hated that she was not eating solids, at least she was having liquids (juices and whatnot) and that s how I pacified myself. I had been repeatedly told by family and extended family to get a full-time nurse but she objected time and again for the same and I had to side with her. Then July 1st came around and part of extended family also came, and they impressed both on me and her to get a nurse so finally, I was able to get her nurse. I was also being pulled in various directions (outside my stuff, mumma s stuff) and doing whatever she needed in terms of supplies. On July 4th, think she had low blood pressure but without a machine, one cannot know. At least that s what I know. If somebody knows anything better, please share, who knows it may save lives. I don t have a blood pressure monitor even to date

There used to be 5-6 doctors in our locality before the Pandemic, but because of the Pandemic and whatever other reasons, almost all doctors had given up attending house calls. And the house where I live is a 100-year-old house so it has narrow passageways and we have no lift. So taking her in and out is a challenge and an ordeal, and something that is not easily done. I had to do some more works so I asked the nurse to stay a bit over 8 p.m. I came and the nurse left for the day. That day I had been distracted for a number of reasons which I don t remember what was but at that point in time, doing those works seemed important. I called out to her but she didn t respond. I remember the night before she had been agitated while sleeping, I slept nearby and kept an eye on her. I had called her a few times to ask whether she needed something but she didn t respond. (this is about the earlier night). That evening, it was raining quite a bit, I called her a few times but she didn t speak. I kissed her on the cheek and realized she is cold. Mumma usually becomes very agitated if she feels cold and shouts at me. I realized she is cold and her body a bit stiff. I was supposed to eat but just couldn t. I dunno what I suspected, I just hired a rickshaw and went around till 9 p.m. and it was a fruitless search for a doctor. I returned home, and again called her but there was no response. Because she was not responding, I became fearful, had a shat, and then dialed the hospital. Asking for the ambulance, it took about an hr. but finally, the ambulance came in. It was now 11 o clock or 2300 hrs. when the ambulance arrived in. It took another half an hr. getting few kids who had come from some movie or something to get them to help mum get down through the passage to the ambulance. We finally reached the hospital at 2330. The people on casualty that day were known to me, and they also knew my hearing problem, so it was much easier to communicate. Half an hour later, they proclaimed her dead. Fortunately or not, I had just bought the newer mobile phone just a few days back. And right now, In India, WhatsApp is one of the most used apps. So I was able to chat with everybody and tell them what was happening or rather what has happened. Of all, mamaji (mother s brother) shared that most members of the family would not be able to come except a cousin sister who lives in Mumbai. I was instructed to get the body refrigerated for a few hrs. It is only then I came to know various ways in which the body is refrigerated and how cruel it would have been towards Atal Bihari Vajpayee s family, but that is politics. I had to go to quite a few places and was back home around 3 a.m. I was supposed to sleep, but sleep was out of the question. I whiled away a few hrs. playing, seeing movies, something or the other to keep myself distracted as literally, I had no idea what to do. Morning came, took a bath, went outside, had some snacks, came home and somewhere then slept. One of my Mausi s (mother s sister) was insisting to get the body burnt in the morning itself but I wanted at least one relative to be there on the last journey. Cousin sister and her husband came to Pune around 4 p.m. I somehow woke, ringing, the vibration I do not know what. I took a short bath, rushed to the place where we had kept the body, got the body and from there where we had asked permission to get the body burned. More than anything else, I felt so sad that except for cousin sister, and me, nobody was with her on the last journey. Even that day, it was raining hard, so people avoided going out. Brother-in-law tried to give me some money, but I brushed it off. I just wanted their company, money is and was never the criteria. So, in the evening we had a meal, my cousin sister, brother-in-law, their two daughters and me. The next day we took the bones and ash to Alandi and did what was needed. I have tried to resurrect the day so many times in my head trying to figure out what I could have done better and am inconclusive. Having a blood pressure monitor for sure would have prevented the tragedy or at least post-phoned for it for a few more days, weeks, years, dunno. I am not medically inclined.

The Books I have to confess, the time they said she is no more, I was hoping that the doctors would say, we have a pill, would you like to take it, it would reunite you with mum. Maybe it wa crazy or whatever, but if such a situation had been, I would have easily gone for it. If I were to go, some people might miss me, but nobody would miss me terribly, and at least I would be with her. There was nothing to look forward to. What saved me from going mad was Michael Crichton s Timeline. It is a fascinating and seductive book. I had actually read it years ago but had forgotten. So many days and nights I was able to sleep hoping that quantum teleportation can be achieved. Anybody in my space would be easily enticed. What joy would it be if I were to meet mum once again. I can tell my other dumb child what to do so she lives for few more years. I could talk to her, just be with her for some time. It is a powerful and seductive idea. I can see so many cults and whatnot that can be formed around it, there may already be, who knows. Another good book that helped me to date has been Through The, Rings Of Fire (Hardcover, J. D. Benedict Thyagarajan). It is an autobiography of Venkat Chalasany (story of an orphan boy who became a successful builder in Pune and the setbacks he had.) While the author has very strong views and I sometimes feel very naive views about things, I was taken a ride of my own city as it was in 1970s and 1980s. I could very well imagine all the different places and people as if they were happening right now. While I have finished the main story, there is still a bit left to read and I read 5-10 minutes every day as it s like a sweet morsel, it s like somebody sharing a tale passed without me having to make an effort. And no lies, the author has been pretty upfront where he has exaggerated or told lies or simply made-up stuff. I was thinking of adding something about movies and some more info or impressions about android but it seems that would have to wait, I do hope, it does work for somebody, even if a single life can be saved from what I shared above, my job is done.

4 July 2022

Russ Allbery: Review: She Who Became the Sun

Review: She Who Became the Sun, by Shelley Parker-Chan
Series: Radiant Emperor #1
Publisher: Tor
Copyright: 2021
Printing: 2022
ISBN: 1-250-62179-8
Format: Kindle
Pages: 414
In 1345 in Zhongli village, in fourth year of a drought, lived a man with his son and his daughter, the last surviving of seven children. The son was promised by his father to the Wuhuang Monastery on his twelfth birthday if he survived. According to the fortune-teller, that son, Zhu Chongba, will be so great that he will bring a hundred generations of pride to the family name. When the girl dares ask her fate, the fortune-teller says, simply, "Nothing." Bandits come looking for food and kill their father. Zhu goes catatonic rather than bury his father, so the girl digs a grave, only to find her brother dead inside it with her father. It leaves her furious: he had a great destiny and he gave it up without a fight, choosing to become nothing. At that moment, she decides to seize his fate for her own, to become Zhu so thoroughly that Heaven itself will be fooled. Through sheer determination and force of will, she stays at the gates of Wuhuang Monastery until the monks are impressed enough with her stubbornness that they let her in under Zhu's name. That puts her on a trajectory that will lead her to the Red Turbans and the civil war over the Mandate of Heaven. She Who Became the Sun is historical fiction with some alternate history and a touch of magic. The closest comparison I can think of is Guy Gavriel Kay: a similar touch of magic that is slight enough to have questionable impact on the story, and a similar starting point of history but a story that's not constrained to follow the events of our world. Unlike Kay, Parker-Chan doesn't change the names of places and people. It's therefore not difficult to work out the history this story is based on (late Yuan dynasty), although it may not be clear at first what role Zhu will play in that history. The first part of the book focuses on Zhu, her time in the monastery, and her (mostly successful) quest to keep her gender secret. The end of that part introduces the second primary protagonist, the eunuch general Ouyang of the army of the Prince of Henan. Ouyang is Nanren, serving a Mongol prince or, more precisely, his son Esen. On the surface, Ouyang is devoted to Esen and serves capably as his general. What lies beneath that surface is far darker and more complicated. I think how well you like this book will depend on how well you get along with the characters. I thought Zhu was a delight. She spends the first half of the book proving herself to be startlingly competent and unpredictable while outwitting Heaven and pursuing her assumed destiny. A major hinge event at the center of the book could have destroyed her character, but instead makes her even stronger, more relaxed, and more comfortable with herself. Her story's exploration of gender identity only made that better for me, starting with her thinking of herself as a woman pretending to be a man and turning into something more complex and self-chosen (and, despite some sexual encounters, apparently asexual, which is something you still rarely see in fiction). I also appreciated how Parker-Chan varies Zhu's pronouns depending on the perspective of the narrator. That said, Zhu is not a good person. She is fiercely ambitious to the point of being a sociopath, and the path she sees involves a lot of ruthlessness and some cold-blooded murder. This is less of a heroic journey than a revenge saga, where the target of revenge is the entire known world and Zhu is as dangerous as she is competent. If you want your protagonist to be moral, this may not work for you. Zhu's scenes are partly told from her perspective and partly from the perspective of a woman named Ma who is a good person, and who is therefore intermittently horrified. The revenge story worked for me, and as a result I found Ma somewhat irritating. If your tendency is to agree with Ma, you may find Zhu too amoral to root for. Ouyang's parts I just hated, which is fitting because Ouyang loathes himself to a degree that is quite difficult to read. He is obsessed with being a eunuch and therefore not fully male. That internal monologue is disturbing enough that it drowned out the moderately interesting court intrigue that he's a part of. I know some people like reading highly dramatic characters who are walking emotional disaster zones. I am not one of those people; by about three quarters of the way through the book I was hoping someone would kill Ouyang already and put him out of everyone's misery. One of the things I disliked about this book is that, despite the complex gender work with Zhu, gender roles within the story have a modern gloss while still being highly constrained. All of the characters except Zhu (and the monk Xu, who has a relatively minor part but is the most likable character in the book) feel like they're being smothered in oppressive gender expectations. Ouyang has a full-fledged case of toxic masculinity to fuel his self-loathing, which Parker-Chan highlights with some weirdly disturbing uses of BDSM tropes. So, I thought this was a mixed bag, and I suspect reactions will differ. I thoroughly enjoyed Zhu's parts despite her ruthlessness and struggled through Ouyang's parts with a bad taste in my mouth. I thought the pivot Parker-Chan pulls off in the middle of the book with Zhu's self-image and destiny was beautifully done and made me like the character even more, but I wish the conflict between Ma's and Zhu's outlooks hadn't been so central. Because of that, the ending felt more tragic than triumphant, which I think was intentional but which wasn't to my taste. As with Kay's writing, I suspect there will be some questions about whether She Who Became the Sun is truly fantasy. The only obvious fantastic element is the physical manifestation of the Mandate of Heaven, and that has only a minor effect on the plot. And as with Kay, I think this book needed to be fantasy, not for the special effects, but because it needs the space to take fate literally. Unlike Kay, Parker-Chan does not use the writing style of epic fantasy, but Zhu's campaign to assume a destiny which is not her own needs to be more than a metaphor for the story to work. I enjoyed this with some caveats. For me, the Zhu portions made up for the Ouyang portions. But although it's clearly the first book of a series, I'm not sure I'll read on. I felt like Zhu's character arc reached a satisfying conclusion, and the sequel seems likely to be full of Ma's misery over ethical conflicts and more Ouyang, neither of which sound appealing. So far as I can tell, the sequel I assume is coming has not yet been announced. Rating: 7 out of 10

31 May 2022

Russ Allbery: Review: Maskerade

Review: Maskerade, by Terry Pratchett
Series: Discworld #18
Publisher: Harper
Copyright: 1995
Printing: February 2014
ISBN: 0-06-227552-6
Format: Mass market
Pages: 360
Maskerade is the 18th book of the Discworld series, but you probably could start here. You'd miss the introduction of Granny Weatherwax and Nanny Ogg, which might be a bit confusing, but I suspect you could pick it up as you went if you wanted. This is a sequel of sorts to Lords and Ladies, but not in a very immediate sense. Granny is getting distracted and less interested in day-to-day witching in Lancre. This is not good; Granny is incredibly powerful, and bored and distracted witches can go to dark places. Nanny is concerned. Granny needs something to do, and their coven needs a third. It's not been the same since they lost their maiden member. Nanny's solution to this problem is two-pronged. First, they'd had their eye on a local girl named Agnes, who had magic but who wasn't interested in being a witch. Perhaps it was time to recruit her anyway, even though she'd left Lancre for Ankh-Morpork. And second, Granny needs something to light a fire under her, something that will get her outraged and ready to engage with the world. Something like a cookbook of aphrodisiac recipes attributed to the Witch of Lancre. Agnes, meanwhile, is auditioning for the opera. She's a sensible person, cursed her whole life by having a wonderful personality, but a part of her deep inside wants to be called Perdita X. Dream and have a dramatic life. Having a wonderful personality can be very frustrating, but no one in Lancre took either that desire or her name seriously. Perhaps the opera is somewhere where she can find the life she's looking for, along with another opportunity to try on the Perdita name. One thing she can do is sing; that's where all of her magic went. The Ankh-Morpork opera is indeed dramatic. It's also losing an astounding amount of money for its new owner, who foolishly thought owning an opera would be a good retirement project after running a cheese business. And it's haunted by a ghost, a very tangible ghost who has started killing people. I think this is my favorite Discworld novel to date (although with a caveat about the ending that I'll get to in a moment). It's certainly the one that had me laughing out loud the most. Agnes (including her Perdita personality aspect) shot to the top of my list of favorite Discworld characters, in part because I found her sensible personality so utterly relatable. She is fascinated by drama, she wants to be in the middle of it and let her inner Perdita goth character revel in it, and yet she cannot help being practical and unflappable even when surrounded by people who use far too many exclamation points. It's one thing to want drama in the abstract; it's quite another to be heedlessly dramatic in the moment, when there's an obviously reasonable thing to do instead. Pratchett writes this wonderfully. The other half of the story follows Granny and Nanny, who are unstoppable forces of nature and a wonderful team. They have the sort of long-standing, unshakable adult friendship between very unlike people that's full of banter and minor irritations layered on top of a deep mutual understanding and respect. Once they decide to start investigating this supposed opera ghost, they divvy up the investigative work with hardly a word exchanged. Planning isn't necessary; they both know each other's strengths. We've gotten a lot of Granny's skills in previous books. Maskerade gives Nanny a chance to show off her skills, and it's a delight. She effortlessly becomes the sort of friendly grandmother who blends in so well that no one questions why she's there, and thus manages to be in the middle of every important event. Granny watches and thinks and theorizes; Nanny simply gets into the middle of everything and talks to everyone until people tell her what she wants to know. There's no real doubt that the two of them are going to get to the bottom of anything they want to get to the bottom of, but watching how they get there is a delight. I love how Pratchett handles that sort of magical power from a world-building perspective. Ankh-Morpork is the Big City, the center of political power in most of the Discworld books, and Granny and Nanny are from the boondocks. By convention, that means they should either be awed or confused by the city, or gain power in the city by transforming it in some way to match their area of power. This isn't how Pratchett writes witches at all. Their magic is in understanding people, and the people in Ankh-Morpork are just as much people as the people in Lancre. The differences of the city may warrant an occasional grumpy aside, but the witches are fully as capable of navigating the city as they are their home town. Maskerade is, of course, a parody of opera and musicals, with Phantom of the Opera playing the central role in much the same way that Macbeth did in Wyrd Sisters. Agnes ends up doing the singing for a beautiful, thin actress named Christine, who can't sing at all despite being an opera star, uses a truly astonishing excess of exclamation points, and strategically faints at the first sign of danger. (And, despite all of this, is still likable in that way that it's impossible to be really upset at a puppy.) She is the special chosen focus of the ghost, whose murderous taunting is a direct parody of the Phantom. That was a sufficiently obvious reference that even I picked up on it, despite being familiar with Phantom of the Opera only via the soundtrack. Apart from that, though, the references were lost on me, since I'm neither a musical nor an opera fan. That didn't hurt my enjoyment of the book in the slightest; in fact, I suspect it's part of why it's in my top tier of Discworld books. One of my complaints about Discworld to date is that Pratchett often overdoes the parody to the extent that it gets in the way of his own (excellent) characters and story. Maybe it's better to read Discworld novels where one doesn't recognize the material being parodied and thus doesn't keep getting distracted by references. It's probably worth mentioning that Agnes is a large woman and there are several jokes about her weight in Maskerade. I think they're the good sort of jokes, about how absurd human bodies can be, not the mean sort? Pratchett never implies her weight is any sort of moral failing or something she should change; quite the contrary, Nanny considers it a sign of solid Lancre genes. But there is some fat discrimination in the opera itself, since one of the things Pratchett is commenting on is the switch from full-bodied female opera singers to thin actresses matching an idealized beauty standard. Christine is the latter, but she can't sing, and the solution is for Agnes to sing for her from behind, something that was also done in real opera. I'm not a good judge of how well this plot line was handled; be aware, going in, if this may bother you. What did bother me was the ending, and more generally the degree to which Granny and Nanny felt comfortable making decisions about Agnes's life without consulting her or appearing to care what she thought of their conclusions. Pratchett seemed to be on their side, emphasizing how well they know people. But Agnes left Lancre and avoided the witches for a reason, and that reason is not honored in much the same way that Lancre refused to honor her desire to go by Perdita. This doesn't seem to be malicious, and Agnes herself is a little uncertain about her choice of identity, but it still rubbed me the wrong way. I felt like Agnes got steamrolled by both the other characters and by Pratchett, and it's the one thing about this book that I didn't like. Hopefully future Discworld books about these characters revisit Agnes's agency. Overall, though, this was great, and a huge improvement over Interesting Times. I'm excited for the next witches book. Followed in publication order by Feet of Clay, and later by Carpe Jugulum in the thematic sense. Rating: 8 out of 10

20 May 2022

Louis-Philippe V ronneau: Introducing metalfinder

After going to an incredible Arch Enemy / Behemoth / Napalm Death / Unto Others concert a few weeks ago, I decided I wanted to go to more concerts. I like music, and I really enjoy concerts. Sadly, I often miss great performances because no one told me about it, or my local newspaper didn't cover the event enough in advance for me to get tickets. Some online services lets you sync your Spotify account to notify you when a new concert is announced, but I don't use Spotify. As a music geek, I have a local music collection and if I need to stream it, I have a supysonic server. Introducing metalfinder, a cli tool to find concerts using your local music collection! At the moment, it scans your music collection, creates a list of artists and queries Bandsintown for concerts in your town. Multiple output formats are supported, but I mainly use the ATOM one, as I'm a heavy feed reader user. Screenshot of the ATOM output in my feed reader The current metalfinder version (1.1.1) is a MVP: it works well enough, but I still have a lot of work to do... If you want to give it a try, the easiest way is to download it from PyPi. metalfinder is also currently in NEW and I'm planning to have something feature complete in time for the Bookworm freeze.

18 May 2022

Reproducible Builds: Supporter spotlight: Jan Nieuwenhuizen on Bootstrappable Builds, GNU Mes and GNU Guix

The Reproducible Builds project relies on several projects, supporters and sponsors for financial support, but they are also valued as ambassadors who spread the word about our project and the work that we do. This is the fourth instalment in a series featuring the projects, companies and individuals who support the Reproducible Builds project. We started this series by featuring the Civil Infrastructure Platform project and followed this up with a post about the Ford Foundation as well as a recent ones about ARDC and the Google Open Source Security Team (GOSST). Today, however, we will be talking with Jan Nieuwenhuizen about Bootstrappable Builds, GNU Mes and GNU Guix.
Chris Lamb: Hi Jan, thanks for taking the time to talk with us today. First, could you briefly tell me about yourself? Jan: Thanks for the chat; it s been a while! Well, I ve always been trying to find something new and interesting that is just asking to be created but is mostly being overlooked. That s how I came to work on GNU Guix and create GNU Mes to address the bootstrapping problem that we have in free software. It s also why I have been working on releasing Dezyne, a programming language and set of tools to specify and formally verify concurrent software systems as free software. Briefly summarised, compilers are often written in the language they are compiling. This creates a chicken-and-egg problem which leads users and distributors to rely on opaque, pre-built binaries of those compilers that they use to build newer versions of the compiler. To gain trust in our computing platforms, we need to be able to tell how each part was produced from source, and opaque binaries are a threat to user security and user freedom since they are not auditable. The goal of bootstrappability (and the bootstrappable.org project in particular) is to minimise the amount of these bootstrap binaries. Anyway, after studying Physics at Eindhoven University of Technology (TU/e), I worked for digicash.com, a startup trying to create a digital and anonymous payment system sadly, however, a traditional account-based system won. Separate to this, as there was no software (either free or proprietary) to automatically create beautiful music notation, together with Han-Wen Nienhuys, I created GNU LilyPond. Ten years ago, I took the initiative to co-found a democratic school in Eindhoven based on the principles of sociocracy. And last Christmas I finally went vegan, after being mostly vegetarian for about 20 years!
Chris: For those who have not heard of it before, what is GNU Guix? What are the key differences between Guix and other Linux distributions? Jan: GNU Guix is both a package manager and a full-fledged GNU/Linux distribution. In both forms, it provides state-of-the-art package management features such as transactional upgrades and package roll-backs, hermetical-sealed build environments, unprivileged package management as well as per-user profiles. One obvious difference is that Guix forgoes the usual Filesystem Hierarchy Standard (ie. /usr, /lib, etc.), but there are other significant differences too, such as Guix being scriptable using Guile/Scheme, as well as Guix s dedication and focus on free software.
Chris: How does GNU Guix relate to GNU Mes? Or, rather, what problem is Mes attempting to solve? Jan: GNU Mes was created to address the security concerns that arise from bootstrapping an operating system such as Guix. Even if this process entirely involves free software (i.e. the source code is, at least, available), this commonly uses large and unauditable binary blobs. Mes is a Scheme interpreter written in a simple subset of C and a C compiler written in Scheme, and it comes with a small, bootstrappable C library. Twice, the Mes bootstrap has halved the size of opaque binaries that were needed to bootstrap GNU Guix. These reductions were achieved by first replacing GNU Binutils, GNU GCC and the GNU C Library with Mes, and then replacing Unix utilities such as awk, bash, coreutils, grep sed, etc., by Gash and Gash-Utils. The final goal of Mes is to help create a full-source bootstrap for any interested UNIX-like operating system.
Chris: What is the current status of Mes? Jan: Mes supports all that is needed from R5RS and GNU Guile to run MesCC with Nyacc, the C parser written for Guile, for 32-bit x86 and ARM. The next step for Mes would be more compatible with Guile, e.g., have guile-module support and support running Gash and Gash Utils. In working to create a full-source bootstrap, I have disregarded the kernel and Guix build system for now, but otherwise, all packages should be built from source, and obviously, no binary blobs should go in. We still need a Guile binary to execute some scripts, and it will take at least another one to two years to remove that binary. I m using the 80/20 approach, cutting corners initially to get something working and useful early. Another metric would be how many architectures we have. We are quite a way with ARM, tinycc now works, but there are still problems with GCC and Glibc. RISC-V is coming, too, which could be another metric. Someone has looked into picking up NixOS this summer. How many distros do anything about reproducibility or bootstrappability? The bootstrappability community is so small that we don t need metrics, sadly. The number of bytes of binary seed is a nice metric, but running the whole thing on a full-fledged Linux system is tough to put into a metric. Also, it is worth noting that I m developing on a modern Intel machine (ie. a platform with a management engine), that s another key component that doesn t have metrics.
Chris: From your perspective as a Mes/Guix user and developer, what does reproducibility mean to you? Are there any related projects? Jan: From my perspective, I m more into the problem of bootstrapping, and reproducibility is a prerequisite for bootstrappability. Reproducibility clearly takes a lot of effort to achieve, however. It s relatively easy to install some Linux distribution and be happy, but if you look at communities that really care about security, they are investing in reproducibility and other ways of improving the security of their supply chain. Projects I believe are complementary to Guix and Mes include NixOS, Debian and on the hardware side the RISC-V platform shares many of our core principles and goals.
Chris: Well, what are these principles and goals? Jan: Reproducibility and bootstrappability often feel like the next step in the frontier of free software. If you have all the sources and you can t reproduce a binary, that just doesn t feel right anymore. We should start to desire (and demand) transparent, elegant and auditable software stacks. To a certain extent, that s always been a low-level intent since the beginning of free software, but something clearly got lost along the way. On the other hand, if you look at the NPM or Rust ecosystems, we see a world where people directly install binaries. As they are not as supportive of copyleft as the rest of the free software community, you can see that movement and people in our area are doing more as a response to that so that what we have continues to remain free, and to prevent us from falling asleep and waking up in a couple of years and see, for example, Rust in the Linux kernel and (more importantly) we require big binary blobs to use our systems. It s an excellent time to advance right now, so we should get a foothold in and ensure we don t lose any more.
Chris: What would be your ultimate reproducibility goal? And what would the key steps or milestones be to reach that? Jan: The ultimate goal would be to have a system built with open hardware, with all software on it fully bootstrapped from its source. This bootstrap path should be easy to replicate and straightforward to inspect and audit. All fully reproducible, of course! In essence, we would have solved the supply chain security problem. Our biggest challenge is ignorance. There is much unawareness about the importance of what we are doing. As it is rather technical and doesn t really affect everyday computer use, that is not surprising. This unawareness can be a great force driving us in the opposite direction. Think of Rust being allowed in the Linux kernel, or Python being required to build a recent GNU C library (glibc). Also, the fact that companies like Google/Apple still want to play us vs them , not willing to to support GPL software. Not ready yet to truly support user freedom. Take the infamous log4j bug everyone is using open source these days, but nobody wants to take responsibility and help develop or nurture the community. Not ecosystem , as that s how it s being approached right now: live and let live/die: see what happens without taking any responsibility. We are growing and we are strong and we can do a lot but if we have to work against those powers, it can become problematic. So, let s spread our great message and get more people involved!
Chris: What has been your biggest win? Jan: From a technical point of view, the full-source bootstrap has have been our biggest win. A talk by Carl Dong at the 2019 Breaking Bitcoin conference stated that connecting Jeremiah Orian s Stage0 project to Mes would be the holy grail of bootstrapping, and we recently managed to achieve just that: in other words, starting from hex0, 357-byte binary, we can now build the entire Guix system. This past year we have not made significant visible progress, however, as our funding was unfortunately not there. The Stage0 project has advanced in RISC-V. A month ago, though, I secured NLnet funding for another year, and thanks to NLnet, Ekaitz Zarraga and Timothy Sample will work on GNU Mes and the Guix bootstrap as well. Separate to this, the bootstrappable community has grown a lot from two people it was six years ago: there are now currently over 100 people in the #bootstrappable IRC channel, for example. The enlarged community is possibly an even more important win going forward.
Chris: How realistic is a 100% bootstrappable toolchain? And from someone who has been working in this area for a while, is solving Trusting Trust) actually feasible in reality? Jan: Two answers: Yes and no, it really depends on your definition. One great thing is that the whole Stage0 project can also run on the Knight virtual machine, a hardware platform that was designed, I think, in the 1970s. I believe we can and must do better than we are doing today, and that there s a lot of value in it. The core issue is not the trust; we can probably all trust each other. On the other hand, we don t want to trust each other or even ourselves. I am not, personally, going to inspect my RISC-V laptop, and other people create the hardware and do probably not want to inspect the software. The answer comes back to being conscientious and doing what is right. Inserting GCC as a binary blob is not right. I think we can do better, and that s what I d like to do. The security angle is interesting, but I don t like the paranoid part of that; I like the beauty of what we are creating together and stepwise improving on that.
Chris: Thanks for taking the time to talk to us today. If someone wanted to get in touch or learn more about GNU Guix or Mes, where might someone go? Jan: Sure! First, check out: I m also on Twitter (@janneke_gnu) and on octodon.social (@janneke@octodon.social).
Chris: Thanks for taking the time to talk to us today. Jan: No problem. :)


For more information about the Reproducible Builds project, please see our website at reproducible-builds.org. If you are interested in ensuring the ongoing security of the software that underpins our civilisation and wish to sponsor the Reproducible Builds project, please reach out to the project by emailing contact@reproducible-builds.org.

Next.

Previous.